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1  Executive  Summary 

This  report  summarizes  the  technical  details  of  our  work  under  the  University  Research  Initiative  (URI)  effort 
in  Automatic  Target  Detection/Recognition  (ATD/R).  At  the  outset  of  our  work,  a  long-term  goal  of  great 
concern  for  the  ATD/R  community  was  “revolutionizing  wide-area  imagery  analysis”.  We  structured  our 
work  with  this  ideal  in  mind,  focusing  on  two  enabling  technologies:  (1)  the  design  of  clutter  models  based 
on  Synthetic  Aperture  Radar  (SAR)  in  urban  environments;  and  (2)  fast  algorithms  for  Markov  Random 
Fields.  In  addition,  we  engaged  in  a  small,  yet  rewarding,  effort  in  which  we  developed  a  novel  method 
for  fitting  implicit  algebraic  curves  and  surfaces  to  two-  and  three-dimensional  data.  The  remainder  of  this 
section  is  devoted  to  an  overview  of  these  research  areas.  The  technical  details  of  our  research  are  discussed 
in  Sec.  2. 

1.1  SAR  Building  Detection  in  Urban  Environments 

Because  SAR  is  an  all-weather  sensor,  and  is  operable  both  night  and  day,  it  can  be  used  in  situations 
(cloud  cover,  smoke/fog,  darkness)  where  Electro-Optical  (EO)  sensors  are  practically  useless.  This  makes 
SAR  an  extremely  useful  tool  for  surveillance.  However,  since  radar  derived  imagery  was  historically  of  low 
resolution,  sophisticated  object  detection /recognition  schemes,  like  those  used  to  process  EO  images,  were 
not  available.  Recent  developments  in  sensor  technology  have  made  it  possible  to  obtain  multi-channel, 
high-resolution  SAR  imagery.  We  apply  sophisticated  machine  vision  algorithms  to  such  imagery  to  analyze 
and  detect  objects  of  interest. 

As  a  modality  of  choice  for  wide-area  surveillance,  SAR  has  enjoyed  a  good  deal  of  success  detecting 
targets  of  interest  in  natural  settings.  However,  when  employed  in  urban  environments,  SAR  performance 
simply  is  not  comparable.  Here,  the  presence  of  cultural,  non-target  but  target-like  objects  -  buildings, 
vehicles,  bridges  -  greatly  increases  the  number  of  false  alarms  the  SAR  system  reports.  Furthermore,  the 
SAR  signature  of  a  target  can  be  greatly  distorted  as  a  function  of  the  target’s  physical  location  with  respect 
to  a  non-target  object.  That  is,  the  SAR  signature  of  a  tank  positioned  in  front  of  the  leading  edge  of  a 
building  might  be  very  unlike  the  SAR  signature  of  a  tank  alone.  This  is  because  some  reflections  from 
the  tank  will  have  also  reflected  off  the  building  wall.  In  addition,  returns  from  the  building  alone  will 
arrive  roughly  coincidentally  with  the  reflections  from  the  tank  alone.  Successful  cultural  object  model 
development,  enabling  the  detection  of  a  variety  of  cultural  objects  of  interest,  would  contribute  to  the  goals 
of  MSTARS,  RADIUS,  and  other  Monitor  programs,  and  might  influence  others,  such  as  ARPA’s  Clipping 
Service. 

Ultimately,  it  is  necessary  to  have  SAR  based  models  for  the  complete  urban  environment.  For  this  effort, 
our  focus  centered  on  buildings,  in  particular,  models  for  detection  and  two-dimensional  geometry  estimation. 
Detecting  buildings,  as  well  as  estimating  building  geometry  and  appearance  in  SAR  imagery,  is  of  interest  for 
two  important  reasons:  (1)  Buildings  contribute  a  large  number  of  false  alarms.  By  detecting  buildings,  the 
false  alarm  rate  can  be  greatly  reduced;  and  (2)  Targets  stationed  in  close  proximity  to  buildings  can  easily 
be  missed  using  conventional  analysis  methods.  By  detecting  buildings  and  then  estimating  their  geometry 
and  SAR  appearance  models,  changes  in  appearance  from  stored  models  can  be  exploited  to  detect  the 
presence  of  an  inserted  target.  There  is  significant  skepticism  among  some  in  the  ATR  community  about 
approaching  practical  target  detection  in  any  other  way. 

Unlike  EO  imagery,  the  full  geometric  structure  of  a  building  is  not  present  in  SAR  imagery.  The  three 
prominent  features  of  a  building’s  SAR  signature  are:  (1)  bright  line(s),  resulting  from  the  wall-ground 
dihedral(s)  facing  the  radar;  (2)  shadow  regions  where  there  is  no  energy  return;  and  (3)  backscatter  from 
small  structures  on  the  roof  or  wall  of  a  building,  and/or  weak  backscatter  from  rough  surfaces. 

The  algorithm  we  developed  to  exploit  some  of  these  features  is  discussed  in  detail  in  Sec.  2.1. 
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1.2  Fast  Algorithms  for  Markov  Random  Fields 

Making  progress  in  the  second  key  technology  area  has  a  lot  of  practical  value  for  target-detection  matched 
filters  used  in  both  multi-spectral  infrared  and  synthetic-aperture  imagery.  It  is  also  very  important  for 
super-resolution  algorithms  that  improve  target  detection/classification  performance  by  removing  clutter 
from  these  images,  which  are  often  severely  degraded  by  speckle  and  environmental/propagation  effects. 
These  are  both  extremely  active  areas  of  research  for  many  workers  in  the  ATR  community,  partly  because 
many  new  sensors  such  as  the  Global  Hawk  and  Dark  Star  will  be  coming  on  line  in  the  next  few  years. 
In  addition,  existing  ATR  technologies  are  being  challenged  to  meet  more  stringent  requirements  for  target 
detection  and  correct-classification  probabilities,  given  a  specified  false  alarm  rate,  in  programs  and  ACTD’s 
such  as  SAIP,  MSTAR,  and  FOPEN.  The  latter,  a  foliage-penetration  low-frequency  SAR  program,  is  an 
excellent  example  of  the  increased  difficulty  of  detecting  and  classifying  targets  obscured  by  environmental 
effects.  These  effects  include  HF  radio  interference  as  well  as  the  stochastic  effects  of  scattering,  blurring  and 
layover  induced  by  the  foliage.  The  lowered  range  resolution,  due  to  the  smaller  available  radar  bandwidth  at 
low  frequency,  also  makes  the  ATR  problem  for  FOPEN  more  challenging  than  that  of  traditional  microwave- 
frequency  SARS,  most  of  which  operate  in  the  X-band  region  or  above.  The  fast  algorithms  discussed  in  this 
report  for  estimating  stochastic  target  models  are  applicable  to  target-detection  filters  for  these  problems, 
including  multidimensional  imagery  such  as  fully-polarimetric  or  multifrequency  SAR  (  [1],  [2],  [3]).  In 

addition,  they  are  also  applicable  to  the  challenges  of  resolution-enhancement,  and  of  removing  the  radio 
interference  by  use  of  adaptive  nulling. 

Other  examples  of  the  programmatic  relevance  of  the  fast-algorithms  part  of  our  research  can  be  taken 
from  the  set  of  issues  important  to  SAIP  (Semi- Automated  IMINIT  Processing).  This  is  an  ACTD  whose 
focus  is  to  relieve  the  mounting  pressure  on  human  analysts  who  are  being  tasked  to  analyze  SAR  imagery 
that  arrives  at  least  twice  the  traditional  rate,  while  the  available  number  of  analysts  is  also  shrinking 
by  a  factor  of  2  to  3.  More  efficient  target-detection  algorithms  can  ease  the  computational  load  in  the 
current  system,  whose  single  biggest  contribution  comes  from  the  super-resolution  algorithms.  In  addition, 
many  such  SAR  systems  are  stressed  by  the  limited  bandwidth  available  in  existing  communication  links. 
Stochastic  models  such  as  the  standard  auto-regressive  models,  or  the  interpolation  models  discussed  in  our 
research,  can  be  used  for  image  compression,  a  challenge  that  will  be  increasingly  important  in  years  to 
come. 

Both  the  interpolation  problem  that  we  have  studied  and  the  more  common  problem  of  linear  prediction 
are  widely-used  paradigms  for  random  processes  that  are  frequently  encountered  in  engineering.  In  the  areas 
of  speech  processing  and  adaptive  filtering,  the  use  of  lattice-type  filters  to  estimate  the  coefficients  of  these 
models  is  very  well-known  (  [4],  [5]).  If  the  reader  is  familiar  with  the  block  diagram  for  a  lattice  filter  in 
speech,  or  with  the  Levinson  algorithm,  this  can  be  used  as  a  helpful,  intuitive  mental  model  for  the  kind 
of  problem  and  approach  in  the  research  described  in  this  report.  The  underlying  mathematics  is  different, 
but  the  overall  structure  and  results  have  a  similar  flavor,  and  can  be  applied  to  the  same  areas. 

Basically,  the  problem  of  Markov  random  field  parameter  estimation  is  the  same  as  optimum  linear 
interpolation,  i.e.  two-sided  or  noncausal  prediction.  It  is  known  that  the  problem  of  standard  (one-sided) 
linear  prediction  can  be  solved  by  orthogonalizing  the  Fourier  basis  of  powers  on  the  unit  circle;  this  is 
equivalent  to  finding  the  Cholesky  (triangular)  decomposition  of  the  inverse  covariance  matrix.  To  solve  the 
interpolation  problem,  we  have  extended  this  theory  to  orthogonalize  a  different,  Chebyshev-like  basis.  We 
have  shown  the  size-n  outer  interpolation  problem  in  one  dimension  can  be  solved  in  order  n-squared  steps  by 
a  three-term  recurrence,  somewhat  similar  to  split  algorithms.  The  solution  to  inner  (usual)  interpolation  can 
be  recovered  from  outer  interpolation  by  another  fast  recurrence,  equivalent  to  embedding  a  Toeplitz-plus- 
Hankel  problem  in  a  larger  Toeplitz  problem.  The  most  important  part  of  our  algorithm  research  in  this  area 
has  been  to  extend  this  idea  to  multidimensional  problems,  since  our  main  focus  is  in  image-processing;  more 
detailed  summaries  of  the  algorithm  steps  are  provided  in  §2.2.  We  also  have  shown  how  orthogonalization 
can  be  used  to  derive  other  least-squares  algorithms  such  as  the  Trench  algorithms  for  Toeplitz  and  Hankel 
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matrix  inversion,  and  have  defined  the  spectral  inner  product,  with  generalized  Parseval  equality.  Another 
achievement  of  this  research  was  the  introduction  of  the  Chebyshev  transform,  which  is  reminiscent  of  the 
discrete  cosine  transform,  but  also  weighted  by  the  effect  of  the  spectral  density-  this  is  relevant  to  the  new 
basis  for  interpolation. 

Because  space  is  limited  in  this  document,  only  a  very  high-level  summary  is  being  given  here  in  the 
executive  summary,  with  a  somewhat  greater  level  of  detail  in  §2.2.  Readers  who  are  interested  in  obtaining 
the  full  technical  picture  are  referred  to  [6],  which  contains  general  tutorial  material  on  “a  signal-processing 
view  of  Markov  random  fields,”  which  is  helpful  background  for  the  topics  mentioned  next  in  this  summary. 
As  a  part  of  developing  the  theory  of  estimation  for  Markov  Random  Fields,  we  have  proven  a  new,  “aug¬ 
mented”  version  of  the  Wiener-Hopf  principle  to  explain  the  spectral  form  of  the  optimum  interpolator  for 
these  processes.  It  is  known  that  the  infinite  inverse-covariance  matrix  for  such  a  process  has  the  coeffi¬ 
cients  of  the  denominator  of  the  spectral  density.  We  have  shown  that  this  is  still  nearly  true  for  a  finite 
inverse-covariance  matrix  whose  size  is  greater  than  the  order  of  the  process.  The  matrix  is  banded,  with  the 
spectral  density  coefficients  appearing  in  most  of  the  rows;  the  only  exceptions  are  called  edge  elements,  lo¬ 
cated  near  the  corners  of  the  matrix.  We  also  have  introduced  a  family  of  nearly-stationary  “triangular- AR” 
random  processes,  for  which  maximum  likelihood  and  least  squares  estimators  are  the  same,  with  closed-form 
expressions  for  covariance  inversion. 

The  practical  significance  of  this  work  is  that  coefficients  that  describe  a  random  process  can  be  estimated 
in  a  particularly  fast  and  stable  way.  As  an  example,  the  basic  ideas  from  this  research  were  applied  to  the 
problem  of  segmenting  an  image,  by  means  of  clustering  together  small  image  areas  that  have  the  same 
underlying  statistical  model.  The  statistical  model,  of  course,  is  the  Markov  random  field  coefficients,  which 
were  calculated  for  a  multidimensional,  i.e.  color,  image;  more  detail  can  be  found  in  [6].  Up  to  now, 
relatively  little  research  has  been  done  on  such  vector- valued  Markov  random  fields. 
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1.3  The  3L  Algorithm  for  Fitting  Implicit  Polynomial  Curves  and  Surfaces  to 
Data 

Of  great  importance  to  a  wide  of  variety  Image  Understanding  and  Image  Analysis  problems  is  the  ability  to 

represent  two-  and  three-dimensional  data  or  objects.  Implicit  polynomial  curves  and  surfaces  are  two  of  the 
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most  useful  representations  available.  Their  representational  power  is  evidenced  by  their  ability  to  smooth 
noisy  data  and  to  interpolate  through  sparse  or  missing  data.  Furthermore,  their  associated  Euclidean 
and  affine  invariants  are  powerful  discriminators,  making  implicit  polynomials  a  computationally  attractive 
technology  for  recognizing  objects  in  arbitrary  positions  with  respect  to  cameras  or  range  sensors.  We  have 
developed  a  completely  new  approach  [1,  2]  to  fitting  implicit  polynomials  to  data.  The  algorithm  represents 
a  significant  advancement  of  implicit  polynomial  technology  for  three  important  reasons.  First,  it  is  orders 
of  magnitude  faster  than  existing  methods.  Second,  it  has  significantly  better  repeatability  and  numerical 
stability  than  current  methods.  Third,  it  can  easily  fit  polynomials  of  high,  such  as  14*'‘  to  18*'*,  degree. 
This  approach  provides  a  completely  new  way  of  thinking  about  and  handling  implicit  polynomials. 
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Figure  1:  Two  detail  regions  taken  from  PWF  processed  SAR  image 


Figure  2:  Certain  geometric  primitives  and  their  SAR  signatures 


2  Technical  Details 

2.1  SAR  Building  Detection  in  Urban  Environments 

2.1.1  The  Nature  of  SAR  Imagery 

The  SAR  data  used  in  our  research  consists  of  various  unclassified  MIT  Lincoln  Laboratory  ADTS  mission 
passes.  Each  pass  contains  four  complex,  raw-data  signals,  corresponding  to  the  four  polarization  channels 
(HH,  HV,  VH,  and  VV)  acquired  by  the  sensor.  To  create  an  intensity  image  from  these  channels,  the 
channels  can  either  be  used  individually,  yielding  three  separate  images,  or  collectively,  yielding  a  single 
image.  We  accomplish  the  latter  using  the  Polarimetric  Whitening  Filter  (PWF)  to  produce  a  minimum- 
speckle  image.  The  SAR  image  examples  that  appear  in  this  report  were  all  PIVF  processed.  IVe  emphasize, 
however,  that  the  algorithm  we  developed  does  not  require  this;  the  input  is  any  SAR  image,  regardless  of 
how  it  was  obtained.  Figs.  4-  5(a)  show  a  typical  (2048  x  512)  PWF  processed  SAR  image. 

Fig.  1  shows  two  detail  regions  taken  from  Polarimetric  Whitening  Filter  (PWF)  processed  SAR  imagery. 
The  SAR  data  came  from  an  urban  environment,  and  the  detail  regions  show  a  mixture  of  cultural  and  natural 
structures.  How  a  building  appears  in  this  data  depends  on  its  geometry  and  composition.  That  is,  different 
scatterers  (Fig.  2)  on  the  building  give  rise  to  different  SAR  signatures. 

A  dihedral,  such  as  a  wall-ground  interface,  gives  rise  to  a  bright  line  which  appears  in  the  SAR  image  at 
a  range  corresponding  to  the  distance  between  the  radar  antenna  and  the  wall-ground  interface.  A  trihedral, 
a  corner  that  is  convex  to  the  radar  antenna,  such  as  a  inset-window  corner,  gives  rise  to  a  small  bright 


BI-ROOF-AIR  CONDITIONING  UNIT  DIHEDRAL 
B2-GROUND-WALL  DIHEDRAL  C-WINDOW  TRIHEDRALS 
D-LINE  RETURN  FROM  CYLINDRICAL  WATER  TOWER 
E-POINT  SCATTERERS  ON  ROOF 

Figure  3:  A  sketch  of  some  SAR  signal  returns 


spot.  A  cylinder,  such  as  a  water  tower,  can  produce  a  line.  Decorative  structures  on  a  roof-wall  interface 
are  small  discrete  scatterers,  which  can  produce  a  linear  pattern  of  small  spots.  Generally  there  will  be 
backscatter  containing  speckle  (which  can  often  be  modeled  as  a  short  spatial-correlation  Gaussian  process) 
from  a  surface  such  as  a  roof  or  a  rough  wall.  This  backscatter  can  be  bright  or  it  can  be  very  dim.  Finally, 
there  are  radar  shadows,  due  to  occlusion. 

Fig.  3  is  an  example  of  a  building  whose  geometry  and  composition  would  give  rise  to  the  hypothetical 
SAR  signature  shown,  according  to  the  scattering  properties  described  above.  It  is  important  to  emphasize 
that  the  actual  SAR  signature  depends  strongly  on  the  physical  construction  of  the  building.  For  example, 
improper  joints  in  window  frames  might  reduce  the  strength  of  trihedral  returns,  or  cracks  in  roofing  material 
might  create  strong  backscatter.  Because  of  this,  it  was  necessary  to  develop  a  building  detection  scheme 
capable  of  handling  nuances  in  building  signatures. 

2.1.2  Discussion  of  the  Algorithm 

As  mentioned  in  the  Executive  Summary,  buildings  generally  have  three  prominent  features  in  their  SAR 
signatures:  (1)  a  bright  line  segment  corresponding  to  the  dihedral  reflection  from  a  wall-ground  interface, 
termed  a  crease',  (2)  shadow  regions  on  the  sides  of  the  building  away  from  the  radar;  and  (3)  backscatter 
caused  by  windows,  or  smaller  scale  objects  on  the  building  walls  or  roof.  To  detect  buildings,  we  developed 
a  hierarchical  approach:  we  first  look  everywhere  for  creases,  and  then  search  for  supporting  clues  in  the 
areas  surrounding  the  most  likely  candidates. 

At  the  heart  of  the  algorithm  is  a  template  based  crease  detector.  Here,  we  use  different  features  to 
determine  if  there  is  a  crease  located  at  a  particular  position  and  orientation  in  the  SAR  image.  These 
features  are  based  on  both  local  and  global  image  information,  and  are  designed  to  capture  a  broad  range 
of  typical  crease  characteristics.  Most  importantly,  the  algorithm  design  is  modular.  Using  the  output  of 
the  crease  detector,  the  reliability  of  the  building  detections  can  be  greatly  increased  by  exploiting  other 
aspects  of  the  SAR  signature  to  verify  the  detections.  Under  this  research  program,  we  developed  modules 
to  classify  crease  geometry,  and  verify  proper  shadowing.  We  were  unable  to  pursue  a  backscatter  module 
prior  to  contract  end. 

A  challenge  in  bright-line-detection  in  SAR  imagery  is  that  the  nature  of  the  pixel-level  image  intensity 
values  can  be  of  various  types.  A  line  can  appear  as  a  thin  ridge,  a  thin  sequence  of  discrete  points,  a 
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concatenation  of  thin  ridges  and  small  blobs,  or  a  wide  swath.  There  are  various  scattering  models  that  give 
rise  to  all  of  these.  The  crease  finder  described  in  the  following  paragraphs  was  designed  to  be  robust  to  all 
of  these  types.  In  a  more  sophisticated  system,  it  might  be  worthwhile  to  interpret  estimated  line  segments 
with  regard  to  the  type  of  scattering  geometries  and  surfaces  that  could  have  given  rise  to  them. 

Creases  tend  to  be  composed  of  locally  bright  pixels  in  an  image.  That  is,  the  pixels  along  a  particular 
crease  vary  in  intensity  about  an  average  value,  while  pixels  ofiF  the  crease  are  of  lower  intensity.  As  such,  we 
briefiy  discuss  why  a  simple  threshold  technique  to  detect  creases  is  inadequate.  In  the  ideal  case,  creases 
would  be  uniformly  bright,  and  the  only  brighter  pixels  would  correspond  to  signatures  of  trihedral  reflectors, 
strong  point  reflectors,  etc.;  off  the  crease,  the  “background”  intensity  would  be  dramatically  lower.  However, 
in  the  practical  case,  creases  have  different  average  intensities  within  a  particular  image.  A  global  threshold 
would  necessarily  exclude  some  creases  from  detection.  Likewise,  the  difference  in  intensity  between  a  crease 
and  its  background  can  be  highly  variable.  A  local  threshold  would  necessarily  introduce  false  creases.  This 
is  particularly  evident  when  natural  clutter  is  present.  A  crop  of  trees  situated  in  the  correct  manner  can 
produce  one  or  more  crease-like  patches  in  its  signature.  The  solution  lies  in  combining  local  information  - 
clues  about  the  geometry  and  composition  of  a  crease  (simple  line  segment,  not  too  wide,  not  too  broken, 
and  not  too  misshapen)  -  with  global  intensity  information  to  affect  a  compromise  between  missed  and  false 
creases. 

As  alluded  to  above,  the  global  image  information  we  use  is  a  histogram  of  the  intensity  values  for  the 
entire  SAR  image.  Having  computed  this,  the  algorithm  processes  the  (2048  x  512)  ADTS  derived  SAR 
images  by  operating  on  non-overlapping  windows  of  size  {W  x  W).  W  is  taken  to  be  a  power  of  two  for 
convenience.  In  the  template  based  crease  detection  stage,  the  templates  we  use  are  elongated  rectangles 
of  dimension  (3  x  64)  pixels  at  different  orientations.  These  values  were  chosen  in  accordance  with  typical 
crease  lengths  seen  in  the  data  (related  to  building  wall  lengths),  and  are  a  parameter  of  the  algorithm. 
Fixing  the  template  length  sets  W.  Here,  W  =  64.  The  template  is  applied  to  the  SAR  data  in  a  manner 
as  to  cover  every  pixel  in  every  window  with  at  least  one  template  at  every  orientation.  To  accomplish  this, 
our  orientation  resolution  is  ^  radians. 

The  templates  are  applied  to  the  underlying  SAR  image  in  two  ways.  First,  we  use  the  intensity  data  to 
obtain  a  Constant  False  Alarm  Rate  (CFAR)  style  statistic.  That  is,  we  compare  the  average  intensity  within 
the  template  region  to  the  average  intensity  in  a  neighbor  around  the  region.  Since  a  crease  is  composed  of 
locally  bright  pixels,  we  expect  a  high  value  for  this  statistic  if  there  is  a  crease  situated  under  the  template. 
The  statistic  is  compared  to  the  appropriate  false  alarm  threshold  rate,  and  classified  according  to  outcome. 
Second,  but  only  if  the  outcome  was  class  “crease” ,  we  create  a  binarization  of  the  intensity  data  within 
the  templates.  Here,  we  compute  a  local  histogram  of  the  intensity  data  within  each  window  and  determine 
its  0.90  quantile.  This  value  is  compared  to  the  0.96  quantile  of  the  global  histogram  previously  computed, 
and  the  larger  of  the  two  is  selected  as  the  binarization  threshold.  The  effect  is  that  the  global  threshold 
acts  as  a  lower  bound  on  crease  pixel  brightness;  locally,  the  threshold  is  allowed  to  be  greater  than  this 
value.  These  quantiles  were  selected  by  extensive  empirical  study  of  the  ADTS  data.  They  were  best  able  to 
differentiate  creases  from  other  types  of  returns.  The  binarization  itself  consists  of  comparing  the  intensity 
value  of  each  pixel  in  the  template  to  the  threshold.  The  result  is  a  map  which  indicates  which  of  the  pixels 
in  the  template  are  part  of  the  candidate  crease.  This  allows  us  to  detect  creases  that  do  not  fill  the  entire 
template. 

Using  this  map,  we  perform  a  pixel  count  and  do  a  “gap  test”  -  a  structural  test  of  the  integrity  of  the 
candidate  crease.  We  require  that  at  a  minimum,  45  percent  of  the  pixels  in  the  map  be  “on” .  Although 
this  appears  to  be  a  low  number,  it  is  set  intentionally  to  account  for  thinner  creases,  the  templates  are 
three  pixels  wide,  but  creases  may  be  only  one  or  two  pixels  wide.  For  the  gap  test,  we  search  for  breaks 
in  the  direction  of  the  template,  and  require  that  these  gaps  be  no  longer  than  j  of  the  template  length,  or 
16  pixels  in  the  typical  case.  If  there  were  a  gap  bigger  than  this  size,  an  explanation  is  that  there  are  two 
distinct  creases  in  the  template.  As  a  direct  consequence  of  the  gap  test,  it  is  seen  that  two  creases  each 
smaller  than  64  pixels  and  separated  by  only  20  pixels  would  go  undetected.  To  avoid  this  possibility,  we 
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simply  repeat  the  entire  process  with  smaller  templates  (i.e.  3  x  32).  While  this  increases  the  amount  of 
computation,  superior  results  are  obtained  with  the  gap  criteria  in  place. 

If  the  data  in  a  template  passes  each  of  the  above  tests  successfully,  it  is  fully  considered  a  candidate  crease. 
Typically,  a  true  crease  will  give  rise  to  candidate  creases  in  more  than  one  template.  The  best  statistics 
will  come  from  the  template  aligned  perfectly  with  the  crease;  as  templates  move  away  (displacement  or 
orientation)  from  the  crease,  their  statistics  will  gradually  worsen.  Because  more  than  one  crease  can  be 
present  in  a  window,  choosing  a  single  best  template  is  not  appropriate.  Instead,  we  select  all  the  best 
templates  which  are  dissimilar  to  each  other  in  position  and  orientation  such  that  they  would  suggest  the 
presence  of  multiple  distinct  creases  in  the  window.  We  say  that  two  templates  are  dissimilar  if  their  centers 
are  more  than  15  pixels  apart  or  their  orientations  differ  by  more  than  0.5  radians. 

The  first  pass  is  complete  after  all  windows  have  been  processed.  Note  that  the  windows  can  be  processed 
in  parallel,  for  a  significant  computational  savings.  The  second  pass  is  identical  to  the  first,  but  with  smaller 
templates.  This  will  detect  multiple  creases  too  small  to  pass  the  first  gap  test.  Before  performing  the  second 
pass  though,  any  creases  detected  in  the  first  pass  are  removed  from  the  SAR.  image,  to  prevent  their  being 
redetected.  Upon  completion  of  the  second  pass,  all  the  detected  creases  must  be  grouped,  since  creases 
might  easily  run  for  longer  than  the  (3  x  64)  pixel  template  size.  Also,  this  allows  us  to  correct  multiple 
detections  of  a  single  crease,  by  merging  creases  with  very  similar  positions  and  orientations. 

With  high  likelihood,  each  of  the  crease-groups  represents  a  bona  fida  crease.  To  best  estimate  its  actual 
orientation  and  position,  we  fit  a  line  to  the  amalgamated  map  data  for  each  crease-groups  (all  the  pixels 
that  passed  the  threshold)  using  a  scatter  matrix  (i.e.  least-square  perpendicular  distance  error).  These  are 
the  creases  shown  in  Figs.  4-  5(b). 

To  detect  a  building,  we  recognize  that  a  typical  rectangular  building  gives  rise  to  two  creases  (unless,  of 
course,  one  of  its  walls  is  perpendicular  to  the  line  of  sight  of  the  radar,  which  we  handle  separately).  The 
key  is  to  look  for  pairs  of  creases  situated  at  approximate  right  angles  to  one  another,  or,  for  single  creases, 
at  right  angles  to  the  line  of  sight  of  the  radar  (apriori  known  in  these  images).  The  final  step  is  to  look  for 
shadows  behind  these  creases  away  from  the  radar.  This  is  again  done  by  a  negated  CFAR  statistic  method 
since  it  is  expected  that  shadow  regions  will  be  darker  than  their  surroundings.  Buildings  are  detected  if  all 
the  criteria  are  met.  Figs.  4-  5(c)  depict  the  detected  buildings  overlayed  on  the  original  SAR  image. 
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2.2  Fast  Algorithms  for  Estimating  Markov  Random  Fields 

This  section  gives  a  greater  level  of  detail  about  our  work  on  task  two  of  the  research.  The  material  is  taken 
from  Scheffe’s  thesis  [1]  which  is  available  as  a  complete  reference  for  those  who  would  like  the  full  technical 
picture,  more  background,  or  additional  references  to  the  literature.  For  this  report,  we  have  emphasized 
intuitive  and  introductory  material,  so  that  at  least  the  reader  can  see  what  the  basic  ideas  were  and  how 
they  relate  to  other  concepts  that  may  be  more  familiar.  To  communicate  the  basic  ideas  while  avoiding 
some  tedious  mathematical  details,  results  are  often  stated  just  for  one-dimensional  processes,  even  though 
they  are  valid  in  two  dimensions  with  a  more  complicated  formulation. 

2.2.1  What  is  a  Markov  Random  Field? 

As  indicated  in  the  accompanying  box,  there  are  at  least  four  important  ways  of  describing  spatial  random 
processes  known  as  Markov  random  fields  (MRF).  Unfortunately,  it  is  not  possible  to  give  a  complete  tutorial 
here  about  these  viewpoints  and  why  they  are  equivalent,  but  we  will  at  least  make  a  formal  definition  here 
sketching  the  first  viewpoint.  The  next  viewpoint,  involving  least  squares  estimates  of  model  parameters, 
is  probably  most  familiar  to  the  general  audience.  This  second  viewpoint  is  described  in  some  detail  in 
§§2. 2.3-2. 2.6,  so  nothing  further  will  be  said  about  it  here  in  §2.2.1.  Finally,  here  in  §2.2.1,  we  will  give  a 
brief,  intuitive  introduction  to  the  last  two  viewpoints.  A  good  deal  more  information  about  the  last  two 
is  available  in  [1],  including  an  interesting  generalization  of  the  Wiener-Hopf  theorem.  Since  our  guiding 
principle  here  is  to  present  a  selection  of  intuitive  and  conceptual  material  (at  the  possible  risk  of  being  too 
introductory  or  possibly  brief),  the  Wiener-Hopf  material  has  been  omitted. 


Summary  2.1  (Four  Views  of  Stationary  Gauss-Markov  Random  Fields) 

Listed  below  are  four  different  mathematical  objects  that  are  specified  by  certain  coefficients. 
Although  the  mathematical  descriptions  are  different,  the  coefficients  have  to  be  the  same  for 
a  zero-mean,  stationary  Gaussian  Markov  random  field,  and  are  referred  to  as  interpolation 
coefficients  or  random  field  parameters. 

•  coefficients  gi  of  the  data  variables  yi  in  the  Gaussian  conditional  mean,  Ms  |  y  = 
'^k  OkUk',  this  describes  the  probability  the  center  variable  x  takes  on  a  certain  value, 
conditioned  on  the  (finite!)  set  of  relevant  neighboring  values  yk- 

•  solution  to  the  problem  of  finding  the  optimum  linear  interpolation  estimator  for 

the  random  process  :  (Jg)fc  =  Qi^k+i 

•  coefficients  of  the  inverse  spectral  density  Q{z)  =  llS{z)  =  qo  (l  -  Y^ki^o  9kZ~'°) 

•  entries  in  the  rows  of  the  inverse  covariance  R“^  =  except  for  edge  effects 

near  the  corners  of  the  matrix.  This  is  a  banded  matrix,  which  is  Toeplitz,  except  for 
edge  effects.  The  quadratic  form  |x'^R“^x  is  an  important  example  of  a  potential 
function,  an  equivalent  way  to  specify  a  Markov  random  field. 


Conditional  Mean  Approach-  Neighbors  and  Interactions  in  a  Lattice: 
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2/2  —  Xi—ij 

Vi  = 

X  —  Xi^j  — ? 

2/4  —  Xij^i 

2/3  = 

Random  fields  are  often  described  as  values 
Xij  in  a  two-dimensional  lattice.  Intuitively, 
a  Markov  random  field  has  the  property  that 
only  finitely  many  neighbors  influence  the  field 
value  at  a  central  location  in  the  lattice.  This 
is  illustrated  at  the  left,  which  shows  a  cen¬ 
tral  pixel  at  location  (i^j)  in  a  large  array 
of  numerical  data,  that  could  represent  any¬ 
thing  from  grey-level  image  values  to  yields 
of  potato  plants  in  the  agricultural  field  test 
station.  An  example  of  the  Markov  idea  is 
that  the  center  value  Xij  is  influenced  only  by 
the  immediate  neighbors  to  the  west,  north, 
south,  and  east.  These  neighboring  values  are 
labelled  yi,y2,y3,2/4  for  simplicity  in  the  dia¬ 
gram. 


For  this  example,  the  Markov  property  is  expressed  by  saying  that  the  conditional  probability  that  the  center 
variable  Xij  takes  on  a  certain  value  x  depends  only  on  the  y’s,  even  if  a  much  greater  amount  of  neighboring 
data  is  given: 


pr\ 


all  other  pixel 
values  in  image 


=  pr  Xij  =  X 


=  m  \ 


—  yi 

—  2/2 

~  2/3 

—  2/4 


=  p»’(a;|y) 


(2.2.1) 


J 


I  I  -  / 

This  is  an  example  of  a  first-order  Markov  random  field,  where  the  neighbors  influencing  the  center  value 
are  no  more  than  the  minimum  distance  away  from  the  center  point  in  the  lattice. 

A  general  Markov  random  field  whose  order  is  at  most  n  is  defined  by  requiring  that  for  each  central 
lattice  site  (i,  j),  only  its  neighbors  of  order  up  to  n  influence  the  random  field  value  there: 


pr  Xij  -  X 


all  other  pixel 
values  in  image 


=  pr  Xij  =  X 


Xk,l  =  Vk.l 


(2.2.2) 


(kj)  a  pth-order  neighbor 
I  of  {i,j),  P<n,  {k,l)  ^  {i,j)  ^ 

For' the  order  of  a  Markov  field  to  be  uniquely  defined,  we  can  keep  whittling  down  the  set  of  neighbors 
on  the  right  until  we  are  beyond  the  critical  size  for  which  the  Markov  property  is  true.  So  in  general,  an 
mth  order  Markov  random  field  is  defined  by  taking  m  to  be  the  minimum  of  the  integers  n  satisfying 
the  right  side  of  (2.2.2).  The  values  of  the  neighbors  which  influence  the  central  pixel  value  x  often  will 
be  denoted  simply  as  y  —  (yi, * •  •  using  a  simple  scalar  index  f,  rather  than  a  planar  index  like 

These  will  be  called  data  variables:  given  a  central  site  location,  the  data  variables  correspond  to 
the  neighboring  sites  which  must  be  included  in  the  conditional  probability  (2.2.2).  So  far,  the  definition  we 
have  just  made  is  satisfied  by  a  great  variety  of  random  processes  which  are  neither  stationary  nor  Gaussian. 
However,  making  these  addtional  requirements  means  that  the  conditional  probability  in  (2.2.2)  has  to  be  a 
linear  combination  (  weighted  sum)  of  the  data  values  yi.  The  weights  in  this  sum  are  the  Markov  random 
field  coefficients  gr,  just  as  with  many  other  least-squares  problems  such  as  prediction,  they  can  be  found  by 
solving  the  appropriate  set  of  normal  equations  (§2.2.3). 
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This  general  definition  involving  neighbors  with  finite  influence  on  a  lattice  applies  to  an  enormous  variety 
of  interesting  spatial  phenomena,  ranging  from  magnetic  moments  in  a  lattice,  to  geographical  distributions 
of  human  social  phenomena,  to  the  distributions  of  galaxies.  For  a  good  deal  of  such  processes,  it  is  not 
necessary  to  make  the  stationary  Gaussian  assumption  mentioned  in  the  box;  generalizations  of  the  first  and 
fourth  viewpoints  still  define  what  a  Markov  random  field  is. 


Survey—  Covariance  Inversion,  Interpolation  and  the  Spectral  Density:  Here,  we  will  give  an 
introduction  to  a  very  useful  relationship  among  the  inverse  covariance,  the  optimal  interpolator,  and  the 
spectral  density  of  a  second-order  stationary  random  process.  ^  This  relationship  does  not  require  that 
the  spectral  density  denominator  be  factorizable  into  “poles”  (especially  unlikely  in  2D),  but  it  does  require 
that  it  be  a  symmetric  polynomial- type  form,  which  is  explained  more  in  [1].  This  is  a  very  mild  general 
condition,  and  is  satisfied  by  most  processes  studied  in  signal  processing. 

As  an  example  of  the  relationship,  we  will  consider  the  second-order  AR  process  in  one  dimension,  where 
the  difference  equation 

Xk  =  axk-i  +  bxk-2  +  tcfe,  a,6  e  IR,  ^  ^  (2.2.3) 

is  driven  by  white  noise  rcfc.  The  frequency-space  version  of  this  difference  equation  is  {l-az~^-bz~'^)  X{z)  = 
W(z),  where  W{z)  is  the  input  and  X{z)  is  the  output.  So  the  associated  transfer  function  is  H{z)  = 
1/(1  -  az-^  -  bz^^),  and  thus  the  spectral  density  S{z)  for  this  process  must  be  given  by  H[z)  H{z): 


1 

(l  -  az-i  -  bz~‘^)  (l  -  az-'‘-  -  bz-‘^) 


(2.2.4) 


(1  +  +  62)  +  a(l  +  b)[z->r  z"!]  -  +  ^-2] 

The  denominator  here  is  a  good  example  of  one  of  those  symmetric  forms  in  z  and  z~''-  =  x  which  we  just 
alluded  to. 
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^  • 

(2.2.5) 


discussed  in  [1]. 

The  reason  for  including  this  result  in  a  discussion  of  Markov  random  field  coefficients  should  be  clear  if 
we  look  at  the  optimal  interpolator  associated  with  this  random  process: 

(2.2.6) 

Again,  it  is  not  a  coincidence  that  the  spectral  density  coefficients  have  appeared  in  the  optimal  interpolator- 
this  section  will  show  that  this  actually  can  be  seen  as  an  outcome  of  a  certain  kind  of  Wiener-Hopf  theory. 
All  these  concepts  are  so  closely  related  for  a  Gauss-Markov  process  that  the  inverse  covariance  can  be 

^For  this  discussion,  it  is  actually  not  necessary  to  make  the  Gaussian  assumption,  provided  the  phrase  “optimum  estimator” 
is  replaced  with  “optimum  linear  estimator,” 
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identified  with  the  optimum  interpolator,  up  to  a  scaling.  This  is  important,  because  the  inverse  covariance 
appears  in  the  argument  of  the  exponential  in  the  basic  Gaussian  distribution 


p(x)  = 


1  i  (2.2.7) 

27rt  |R|5  ^ 

This  quadratic  form  V(x)  =  |  x'^R“^x  is  often  called  a  potential  function,  as  in  the  Hammersley- Clifford 
theorem,  a  central  part  of  the  lore  of  Markov  random  fields. 

Another  example  of  the  relationship  between  covariance  inversion  and  interpolation  comes  from  the 
first-order  AR  process  x^+i  =  axk  +  Wk+i ,  where  the  coefficients  in  the  spectral  density 

(1  +  0^)  -  a[z  +  z-^] 
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1  +  a? 

—a 
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—a 

—a 
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-a 

1  +  a2 

—a 

i 
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0 

—a 

R-i  = 


Again,  the  spectral  density  coefficients  also  occur  in  the  interpolator  for  this  first-order  AR  process 
^k—1  "h  ^k+1  ^ 

The  inverse-covariances  in  the  two  examples  above  were  both  infinite  matrices.  Some  not- very-major 
corrections  have  to  be  made  in  the  upper  lefthand  and  lower  righthand  corners  to  apply  this  idea  to  finite 
covariance  inversion.  We  call  these  edge  effects,  and  discuss  how  to  calculate  them  in  [1].  Because  of  these 
edge  effects,  a  finite  inverse-covariance  is  prevented  from  being  completely  Toeplitz:  the  elements  will  not 
be  constant  down  a  diagonal  in  the  vicinity  of  the  top  left  and  bottom  right  comers.  After  all,  a  stationary 
covariance  matrix  has  to  be  Toeplitz,  and  generally  the  inverse  of  a  Toeplitz  matrix  does  not  usually  have 
the  Toeplitz  property.  However,  a  finite  Gauss-Markov  inverse-covariance  is  nearly  Toeplitz:  away  from  the 
corners,  the  matrix  elements  do  follow  a  Toeplitz  pattern,  with  coefficients  supplied  by  the  spectral  density. 

The  spectral  density  (2.2.4)  for  the  second-order  AR  process  turned  out  to  involve  symmetric  terms  in 
2“ 2^  ^—1^  ^  ^2  Pqj.  general  mth  order  process  in  one  dimension,  the  denominator  will  involve  terms 

up  to  0*'”,  which  means  that  rows  in  the  inverse  covariance  matrix  will  extend  m  units  to  the  left  and  m 
units  to  the  right  of  the  main  diagonal.  This  says  that  for  a  Gauss-Markov  process  of  order  m,  the 
inverse  covariance  is  a  banded  matrix  with  2m  -f  1  nonzero  diagonals.  Away  from  the  edges, 
each  row  consists  of  coefficients  from  the  spectral  density  denominator.  The  entries  are  somewhat 
different  near  the  northwest  and  southeast  corners,  but  they  do  still  do  not  extend  more  than  m  steps  away 
from  the  main  diagonal  before  going  to  0.  In  conclusion,  this  is  an  interesting  and  useful  characterization 
of  Gauss-Markov  covariances,  and  it  unifies  the  interpolation  problem  (=  estimation  of  the  random  field 
parameters)  with  spectral  estimation  for  such  processes. 


2.2.2  Meiximum  Likelihood,  Least  Squares  and  Realizations 

In  order  to  have  more  space  to  discuss  other  topics,  just  a  very  brief  summary  of  some  contributions  in  this 
area  will  be  given  here  (the  details,  as  usual,  are  in  [1]).  First,  a  theorem  about  equivalent  realizations 
was  proven,  which  shows  the  relationship  between  random  processes  defined  by  predictive-type  difference 
equations  and  ones  defined  by  interpolation-type  processes.  This  theorem  offers  some  insights  on  the  spectral 
properties  of  simultaneous  auto-regressive  processes,  which  are  symmetric  difference  equations  driven  by 
white  noise.  Another  accomplishment  was  the  introduction  of  a  family  of  nearly-stationary  random  processes 
called  triangular  AR  (auto-regressive)  processes.  These  are  very  close  to  stationary  AR  processes,  and  the 
parameter-estimation  methods  of  maximum  likelihood  and  least  squares  are  identical  for  these  processes. 
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which  might  be  surprising.  Scheffe  showed  that  there  is  a  simple,  closed-form  analytical  expression  for  the 
inversion  of  a  triangular-AR  covariance.  A  discussion  in  [1]  shows  that  in  some  sense,  that  triangular  AR 
covariances  are  actually  “closer”  to  true  stationary  AR  covariances  than  the  Toeplitz  approximation.  (The 
Toeplitz  approximation  is  a  slightly-incorrect  expression  for  a  Markov  random  field  inverse-covariance,  which 
appears  quite  widely  in  the  literature).  We  also  gave  a  new  construction  of  an  analytical  inverse  for  a  finite 
banded,  triangular  Toeplitz  matrix,  based  on  “partial  fractions  for  matrices.” 

2.2.3  Orthogonalization  in  One  Dimension 

Introduction  and  Outline  on  the  Importance  of  Orthogonality:  A  good  introduction  to  the  basic 
idea  here  just  consists  of  looking  the  familiar  covariance  matrix  in  a  new  way.  Basically,  all  the  important 
properties  of  the  covariance  matrix-  as  used  in  linear  prediction  and  many  other  applications-  really  come 
from  the  fact  that  it  is  a  set  of  inner  products 
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The  covariance  matrix  R"  on  the  left  is  very  familiar  as  the  fundamental  ingredient  for  solving  the  linear 
prediction  problem,  as  represented  by  the  normal  equations.  The  matrix  on  the  right  may  look  less  familiar, 
but  all  the  key  properties  used  in  signal  processing  really  come  from  its  nature  as  a  collection  of  inner  prod¬ 
ucts.  The  inner  products  are  essentially  just  a  statement  of  the  Fourier  relationship  between  the  covariance 
elements  {rk)k  and  the  spectral  density  S{z).  They  involve  powers  of  the  variable  ^  =  e*®  on  the  unit  circle, 
which  is  the  natural  machinery  for  expressing  a  Fourier  analysis.  So  one  goal  of  this  section  is  to  make  the 
righthand  side  matrix  more  intuitive  if  it  seems  unfamiliar,  because  it  is  really  an  effective  way  of  looking 
at  least-squares  problems,  and  is  not  limited  to  just  prediction  problems.  Great  profit  is  obtained  here 
from  the  fact  that  the  ideas  and  approach  can  be  extended  to  other  sets  of  functions  besides  the  unit-circle 
Fourier  basis  {z'^}k  =  {e'*®}*,.  A  matrix  G  such  as  the  one  on  the  right,  which  consists  of  inner  products 
of  functions  Gki  =  {z’‘,z‘)  is  called  a  Gramian  matrix.  More  generally,  the  inner  products  in  a  Gramian 
could  be  of  general  basis  elements  Gy  =  {(p^{z),ip^(z)),  which  are  not  necessarily  powers  of  ;z-  in  §2.2.5  we 
will  examine  another  basis,  which  is  ideally  suited  to  solving  interpolation  problems.  Sometimes  we  denote 
the  basis  as  a  subscript  appearing  with  the  Gramian  matrix,  as  in  Gj,  G,^,  respectively. 

The  basic  message  here  is  that  all  the  important  properties  of  the  covariance  R  come  from  the  inner 
product  structure  on  the  right,  and  thus  can  be  generalized  to  our  main  interest,  the  interpolation  problem. 
By  contrast,  the  specialized  property  of  the  covariance  R  being  Toeplitz  does  not  generalize  to  interpolation- 
this  means  that  the  Levinson  algorithm  cannot  just  be  blindly  copied  for  Markov  Random  Field  problems. 
What  does  generalize  to  interpolation  (and  other  least-squares  problems)  is  the  idea  that  linear  prediction 
really  is  an  orthogonalization  procedure.  This  chapter  emphasizes  the  geometric  interpretation  of  this 
idea,  which  rests  just  on  properties  of  inner  products,  not  on  more  specialized  structure  of  the  covariance. 

The  idea  of  this  orthogonalization  procedure  (which  we  call  the  Szego  Recipe)  is  that  the  unit  circle  basis 
functions  {z^}k  are  very  nice  for  bland  Fourier  series  work,  but  that  they  fail  to  be  the  best  statistical  basis 
for  solving  a  least  squares  problem.  This  is  because  the  original  basis  functions  {ar*}*,  are  not  orthogonal 
with  respect  to  the  following  basic  stochastic  (spectral  or  covariance)  inner  product,  which  describes  the 
entries  of  the  two  matrices  above; 

"E  (^Xi  Xj)  —  Ti—j  —  ^  ) 

If  the  righthand  side  above  were  Si-j,  then  the  z'‘  would  have  started  out  already  orthogonal,  i.e.  the 
covariance  matrix  R  would  be  equal  to  the  identity  matrix  I-  this  would  say  the  random  variables  (xk)k 
are  uncorrelated  (white).  But  in  general,  it  is  quite  unlikely  to  expect  that  R  =  I  for  an  arbitrary  random 
process;  noise  is  rarely  white  or  uncorrelated.  The  whole  idea  of  orthogonalization  is  that  a  new  basis 
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is  created  whose  Gramian  matrix  of  inner  products  is  the  identity  I-  essentially,  this  is  the  same  idea  as 
whitening.  Otherwise  said,  the  orthogonalization  process  can  be  viewed  as  a  procedure  which  produces 
new,  statistically  independent  variables  out  of  the  originally  dependent  (correlated)  variables-  one  could  say 
that  these  are  as  well-conditioned  as  possible,  from  the  numerical  point  of  view. 

The  chapter  of  [1]  on  this  material  also  shows  that  a  quite  general  orthogonalization  problem  can  be 
seen  as  really  boiling  down  to  linear  prediction.  The  key  generalization  is  just  to  replace  the  unit-circle  basis 
{z'‘}k  with  a  more  general  basis  {(p'^}k,  but  otherwise  all  the  steps  are  the  same-  they  really  just  depend 
on  straightforward  inner  product  manipulations.  This  generalization  is  extremely  important  for  solving  the 
interpolation  problem.  As  examples  of  how  useful  the  orthogonalization  viewpoint  is,  it  is  used  there  to  give 
some  one-line  proofs  of  important  results:  the  Cholesky  decomposition  and  Szego’s  determinant  expression 
for  constructing  orthogonal  polynomials.  There  are  many  fascinating  relationships  between  the  theory  of 
orthogonal  polynomials  and  problems  of  linear  algebra  (see  e.g.  [2]).  The  end  of  this  chapter  discusses  some 
interconnections  between  the  Gram-Schmidt  or  QR  decomposition,  Schur  and  Levinson-Szegb  procedures, 
and  Cholesky  decomposition. 

Bibliographic  Survey:  The  idea  of  solving  the  prediction  problem  by  orthogonalizing  the  unit-circle  powers 
has  appeared  here  and  there  in  the  signal-processing  literature-  brief  treatments  can  be  found  in  [3], 
[4].  However,  this  theory  has  not  been  popularized  or  used  frequently;  it  is  likely  that  the  covariance-as- 
Gramian  in  (2.2.1)  is  not  familiar  to  some  people.  The  classic  reference  on  the  orthogonal  polynomials  is 
Szegb  [5];  original  work  on  the  unit  circle  polynomials  was  also  done  in  the  1930’s  by  Geronimus  [6],  and 
the  classic  with  emphasis  on  probability  applications  is  [7]. 

The  rest  of  this  section  introduces  a  number  of  concepts  which  are  all  very  central  for  the  fast  algorithm 
descriptions: 

•  the  geometry  of  inner  products,  especially  Gramian  matrices  and  the  spectral  inner  product 

•  what  order-recursive  equations  look  like  for  optimum  prediction  and  optimum  interpolation  in  1  di¬ 
mension,  as  shown  in  the  page-sized  box. 

•  error  polynomials,  which  are  the  basic  tool  for  describing  lattice-type  filters/estimation  algorithms, 
and  require  a  set  of  augmented  linear  equations 

•  the  notion  of  generalized  prediction,  which  is  an  original  contribution  of  this  research,  and  uses  Gramian 
matrices  rather  than  covariance,  Toeplitz  or  Hankel  matrices 


The  Spectral  Inner  Product:  The  inner  products  appearing  in  the  Gramian  matrix  (2.2.1)  are  not 
mysterious-  they  are  just  a  statement  of  the  familiar  Fourier  inversion  property  relating  the  covariance- 
/correlation  elements  Vk  to  the  spectral  density  S{z): 

rk  =  i^  z'=S(z)  5^ 

Here,  F  is  a  simple,  clockwise-oriented  closed  contour  containing  the  unit  circle.  For  simplicity,  we  will  be 
looking  at  a  one-dimensional  stationary,  zero-mean  Gaussian  random  process  {xk)k-  Fourier  inversion  implies 
that  elements  of  the  covariance  matrix  can  be  written  as 

I 

(2.2.2) 

Here,  “E”  denotes  the  expectation  operator  (averaging),  and  a  bar  indicates  complex  conjugation. 

Most  least-squares  problems  require  computing  a  mean-square  error  using  the  expectation  norm  in  the 
time  domain.  For  example,  the  order- n  prediction  problem  seeks  model  coefficients  oi,  02,  .  ,an  to 
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minimize  the  mean-square  prediction  error 

(cr^)”  =  ]E(  I  a;„  -  aiXn-i - a„xo  P)  (2.2.3) 

A  useful  property  of  the  spectral  inner  product  is  that  it  can  change  a  time-domain  expectation  like  this  into 
a  statement  involving  just  algebra  with  ^’s  in  the  frequency  domain.  For  example,  in  the  time  domain  let 
f  =  (/o,/i, . .  • ,  /n-i),  g  =  •  •  •  iffn-i)  be  complex  coefficient  vectors  describing  the  sums  of  shifted 

random  variables  Efc=o  fk^k,  Efc=o  9kXk-  Then  the  time-domain  expectation 

/jfca;*,  =f^Rg=  fkRtidi  =  (f)g)R  (2.2.4) 

describes  an  inner  product  of  the  two  coefficient  vectors,  since  the  size-n  covariance  matrix  is  positive  definite. 
The  Stochastic  ParsevaPs  Relation 

(f,  g)R  =  {F{z),  G{z))s(z)  (2.2.5) 

equatU  this  time  domain  inner  product  to  a  different,  frequency-domain  inner  product  of  the  transforms 
F{z)  =  Efc-o  G{z)  =  '£,l=l9kz’‘.  (These  are  polynomial  or  forward-indexed  transforms  to  avoid 

confusion  with  complex  conjugates—  for  interpolation  problems,  both  positive  and  negative  powers  of  z  will 
be  needed).  In  the  frequency  domain,  each  factor  Rki  =  Vk-i  in  the  sum  from  (2.2.4)  is  replaced  with 
{z'‘,  z^)s  from  (2.2.2),  verifying  (2.2.5): 

2  fk9l{z^z%  =  i  F{z)S{z)W)^.  =  (Fiz),  G{z))s  (2.2.6) 


The  Geometry  of  Prediction:  The  Gram-Schmidt  process  is  a  familiar  way  to  to  orthogonalize  a  set 
of  vectors,  by  ^subtracting  off  projections.”  The  very  same  idea  can  be  used  to  describe  the  order-recursive 


W1  =  VI  -  Proj 


process  of  fitting  first  a  one-step  predictor,  then  a  two-step  predictor  and  so  on,  to  a  set  of  data.  More 
specifically,  the  Gram-Schmidt  process  takes  an  independent,  non-orthogonal  set  of  vectors  vq,  vi,  . . . ,  v„, 

and  turns  them  into  a  new  set  of  orthogonal  vectors  wq  =  vq,  wi,  . . . ,  w„  via 
v„*w„_i  _  v„.wo 

w„  =  Vn - W„_1  —  .  .  . - Wo 

W„_1*W„_1  Wo 'Wo  ^  _ 

Note  that  here  the  (usual,  Euclidean)  dot  product  of  complex  vectors  here  is  a*b  =  Efc=o  h-  But  both 
the  geometry  and  the  algebra  apply  to  other  dot  products  like  ( . ,  .  )r  or  ( . ,  .  )s{z)-  The  latter  is  a  dot 
product  of  functions  on  the  circle  rather  than  vectors. 

Instead  of  starting  with  a  basis  of  vectors  {v^}*,,  the  process  for  prediction  models  starts  with  the  basis 
of  powers  {z*'}*,  on  the  unit  circle  [8].  This  basis  is  not  orthogonal  with  respect  to  the  spectral  inner  product 
(.,  .)s  unless  the  covariance  R  happens  to  be  the  identity  matrix  (white  noise).  By  orthogonalizing  the 
“vectors”  z°,  ...,z",  we  are  whitening  the  covariance  matrix,  i.e.  finding  its  innovations  or  Cholesky 

decomposition.  Instead  of  being  denoted  by  “w*,,”  the  new  basis  consists  of  error  polynomials 

e*(2)  =  z*  -  - anZ°  (2.2.7) 

Note  that  the  new,  orthogonalized  version  of  z'‘  is  expanded  out  in  terms  of  the  original  basis  {z**}*,,  whereas 
the  Gram-  Schmidt  process  expands  out  a  new  Wk  in  terms  of  the  new,  orthogonalized  basis  of  lower-order 
w’s.  Also,  the  Gram-Schmidt  process  is  less  efficient  than  modern  ways  to  compute  error  polynomials  because 
it  requires  k  dot  products  at  step  k,  whereas  say  the  Levinson  algorithm  only  requires  2.  But  the  Levinson 
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process  and  the  Gram-Schmidt  process  are  equivalent  mathematically-  this  can  be  verified  by  substituting 
an  expansion  for  the  new  basis  in  terms  of  the  old. 

Prediction  Compared  with  Interpolation:  In  a  general  estimation  problem,  one  might  be  trying  to 
forecast  a  certain  value  of  a  random  process  using  a  linear  sum  of  other  known  values.  For  example,  nth-order 
linear  prediction  looks  to  the  n  most  recent  past  values  to  construct  an  estimate: 

a^Xk-i  +  a^Xk-i  -h  . . .  +  (2.2.8) 

Here  the  hat  denotes  that  an  estimate  is  being  made,  and  A:  is  a  general  (discrete)  time  index.  Since  we 
are  assuming  stationarity,  it  is  often  convenient  to  take  fe  =  0.  By  contrast,  second-order  interpolation,  for 
example,  involves  two  neighbors  from  the  past,  and  two  from  the  future: 

($o)p  =  9-2X-2  +  9-iX-i  -t-  ffiari  -t-  92X2  (2.2.9) 

The  page-sized  figure  illustrates  a  number  of  examples  of  prediction  and  interpolation  of  different  orders. 
Note  that  the  tradition  for  defining  the  model  order  for  interpolation  uses  the  number  of  neighbors  on  a  side, 
not  the  total  number  of  neighbors.  The  letter  “o”  will  be  reserved  for  prediction  coefficients,  and  “5”  for 
prediction  coefficients  (“/3”  is  also  common  for  Markov  fields).  The  kind  of  model  is  indicated  as  a  subscript, 
and  the  model  order  as  a  superscript. 

The  Normal  Equations  in  Frequency  Space:  The  normal  equations  describe  how  to  find  the  optimum, 
minimum-error  estimator  coefficients  in  terms  of  the  covariance  elements  r*, .  They  can  be  expressed  in  terms 
of  the  error  process,  which  for  nth  order  prediction  is  defined  as 

(e”)fc  =  Xk-  ixa)k  =Xk-  a^Xk-i  -  ...  -  a^Xk-n  (2.2.10) 

The  coefficients  of  this  error  process  will  also  be  denoted  as  the  error  vector  e^  =  (1,  -a^*, . . . ,  -a”)  ; 
these  are  the  same  as  the  coefficients  of  the  error  polynomial  (2.2.7).  We  do  not  follow  the  tradition  of 
incorporating  the  minus  sign  into  the  coefficients,  because  we  will  want  to  define  non-predictive  kinds  of 
errors. 

The  normal  equations  (see  [9],  §13.1)  say  that  this  error  process  must  be  uncorrelated  with  each  of  the 
variables  Xk  used  to  make  up  the  estimate.  For  prediction,  this  can  be  written: 

0  =  lE^  \^Xn  (liXji—i  *  •  •  OjjiXq  j  Xk  ^  (2  2  11) 

=  IE(e"  XT')  =  IE(xi,  if)  A:  =  1,  2, . . . ,  n 

Expanding  out  the  expectations  with  ]E(xfcx7)  =  Vk^i  would  give  the  familiar  Yule- Walker  equations.  Our 
interest,  though  is  to  express  these  relations  in  frequency  space,  so  we  replace  JE{xk^)  with  {z'^,  z‘)s-  What 
appears  is  the  error  polynomial  (2.2.7)  for  prediction: 

0  =  ( 0"  -  - anZ°,  z^  )s  =  (e" (^))  (2.2.12) 

So  the  frequency-space  normal  equations  ask  that  the  error  polynomial  e”(z)  be  orthogonal  to  all  the  preceding 
basis  elements  z°,  z^,  . . .  This  is  why  we  can  identify  prediction  with  the  Gram-Schmidt  process  to 

orthogonalize  the  basis  elements  {z'°}k-  It  also  explains  why  the  covariance  is  the  same  as  the  Gramian  of 
inner  products  in  (2.2.1). 


Theorem  2.2  (Szego  Recipe)  The  e"'{z)  error  polynomials  can  be  obtained  by  orthogonalizing  the  basis 
{z^,  z^,  z‘^,.. . ,  z"}  with  respect  to  the  spectral  inner  product  {■,  ■)s- 
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Although  usually  the  indices  for  prediction  go  backwards  in  time,  it  is  much  more  convenient  to  have  the 
indices  agree  with  the  coefficient  of  We  will  change  notation  from  “a”  to  “c,”  writing  the  error  polynomial 
as 


n— 1 


4  =  an-k 


(2.2.13) 


fc=0 


These  ideas  about  orthogonalizing  error  polynomials  are  not  limited  to  the  basis  {z’'}k,  nor  to  the  spectral 
inner  product  {z'^,z^)s-  A  version  of  these  “Yule- Walker”  equations  occurs  in  every  book  on  how  to  orthogo- 
nalize  a  polynomial  basis  (e.g.  [10]  §1.3),  but  usually  without  making  the  connection  with  linear  prediction. 
A  matrix  system  expressing  the  frequency  space  normal  equations  for  orthogonalization  is 
f  {z^,z°)  {z\z^)  •••  f{z\z^)\ 

a" 


(z^,z'^)  {z^,z^) 


(0"  ^,z^) 


-J-l 


{z'^.z^) 


V(0«,^”-1)  {z\z^-^)  •••  (z"-i,0"-i)/ V  a?  /  \(^”,^"-i)/  _ 

A  linear  system  like  this  will  be  called  the  generalized  prediction  equations.  It  applies  to  very  general 
orthogonalization  problems-  the  basis  need  not  be  the  {z^}k,  and  the  inner  products  do  not  have  to  be 
spectral  inner  products.  Note  that  the  a  indices  run  “backwards”  here,  but  the  c  indices  would  go  forwards. 


For  brevity,  the  Gramian  matrix  of  inner  products  above  will  also  be  denoted  by  Gki  =  {z'^,z^) 
minimum  mean-square  error  associated  with  the  normal  equations 
(a^)”  =  ]E(e”^) 

=  (^",  z^  -  cr'  •  •  •  -  {z^  z^) 


The 


(2.2.14) 


can  be  incorporated  with  the  above  into  a  larger  system,  the  augmented  generalized  prediction  equations: 

(  0  \ 

-rf-*  0 


0 

0 

’  *  Gn-1,0 

Gn,0 

Go,i 

'  *  Gn-1,1 

Gn,l 

G^0,n-1 

•  •  Gn-l,n-l 

Gn,n—1 

VGo.n 

*  *  Gn~l,n 

Gn,n 

(g:)" 

-c; 


,n— 1 
n— 1 


V  1  J 


0 


(2.2.15) 


2.2.4  Interpolation  in  ID:  a  Levinson-style  Approach 

General  Preview-  Estimating  an  Optimal  Interpolator:  In  this  research,  we  considered  two  different 
approaches  to  the  interpolation  problem.  Both  of  these  approaches  the  unifying  theme  of  fitting  a  non-causal, 
two-sided  model  to  data  for  the  random  process  {xk)k-,  the  page-sized  figure  in  §2.2.3  gives  a  graphical 
comparison  between  two-sided  (interpolation)  models  and  one-sided  (predictive)  models,  which  are  more 
common  in  mainstream  signal  processing. 

First,  in  §2.2.4,  we  will  look  at  a  Levinson-style  approach  to  the  interpolation  problem.  This  is  very 
similar  to  the  usual  machinery  in  mainstream  signal  processing  for  predictive-type  problems,  which  makes 
it  particularly  straightforward.  Basically,  this  approach  changes  a  size-n  interpolation  problem  into  a  size- 
(2n-l- 1)  standard  prediction  problem.  This  theme  has  also  appeared  in  [11],  [12],  and  other  work  mentioned 
in  the  survey  below.  An  original  contribution  of  this  research  is  the  basic  strategy  is  of  considering  two  kinds 
of  interpolation  problems,  which  are  shown  in  the  box. 

The  usual  meaning  that  people  attach  to  the  word  “interpolation”  corresponds  to  the  first  model,  called 
inner  interpolation:  we  try  to  guess  the  value  of  a  random  process  at  a  central  point,  given  data  from 
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INNER  INTERPOLATION 


•  •  •  X-3  X-2  X-i  Xo  Xi  X2  X3 

m 


2nd-Order  Inner  Interpolation 

=  92^-2  +  9iX-i  +  glxi  +  glx2 


OUTER  INTERPOLATION 


x-3 


m 


x-2 

v/ 


X-\  Xq  Xi  X2 

V  V  V  V 


2nd-Order  Outer  Interpolation 
{x-'f+xs)^  =  fiX-2  +  fiX-i 

+  fo^O  +  flXl  +  fix2 


both  its  immediate  neighbors  to  the  left  and  neighbors  to  the  right.  The  weighting  coefficients  assigned  to 
the  neighbors  in  this  inner  interpolation  model  are  denoted  pf .  The  number  of  neighbors  determines  the 
order  of  the  model-  it  is  easy  to  show  that  the  data  must  be  used  symmetrically.  Outer  interpolation,  on 
the  other  hand,  starts  from  a  complete  interval  of  data  where  the  center  point  xq  is  not  missing,  and  tries  to 
guess  a  value  for  the  average  value  lying  just  outside  the  data  interval.  Outer  interpolation  coefficients  will 
be  denoted  /,". 

The  interpolation  problem  can  then  be  solved  using  the  following  basic  principles: _ 

Three  Interpolation  Principles 

•  Outer  interpolation  is  a  special  case  of  generalized  (forward)  prediction. 

•  Inner  interpolation  is  a  case  of  generalized  postdiction  (=  backward  prediction). 

•  A  generalized  postdiction  problem  can  be  solved  using  the  solution  to  generalized  predic¬ 
tion,  but  not  vice  versa.  Thus,  the  standard  =  inner  interpolation  problem  can  be  solved 
in  terms  of  the  solution  to  outer  interpolation. 

A  drawback  associated  with  this  first,  Levinson-style  approach  is  that  it  is  inefficient:  it  replaces  a  size-n 
problem  with  a  size-(2n  + 1)  problem,  requiring  twice  as  much  storage  and  computation  as  other  algorithms. 
So  next,  in  §2.2.5,  a  second,  more  efficient  and  economical  approach  is  introduced  for  real-valued  interpolation 
problems.  This  approach  is  best  referred  to  as  an  orthogonal-polynomial  type  approach. 

Survey  of  other  work  in  this  eurea:  In  one-dimensional  signal  processing,  it  has  long  been  known  that 
interpolation  problems,  which  involve  a  non-causal,  symmetric  model,  require  the  use  of  Toeplitz-plus-Hankel 
matrices  (see  [13],  [12],  and  the  survey  in  [14]).  This  T-t-H  problem  is  particularly  important  in  applications 
where  linear  phase  is  required.  Some  of  the  early  work  in  this  area  (  [13],  [12])  went  the  route  of  mapping 
a  T-l-H  matrix  onto  a  size-(2n  -|- 1)  Toeplitz  matrix;  this  is  also  essentially  the  concept  presented  in  [11], 
which  first  presented  a  lattice  filter  concept  for  interpolation.  Many  fast,  order  0{n‘^)  algorithms  have  been 
formulated  for  the  solution  of  these  one-dimensional  Toeplitz-plus-Hankel  problems  (in  addition  to  the  above, 
see  also  [11],  [15],  [16],  [17],  [18],  [19],  [20],  [21].)  In  principle,  block  versions  of  these  Toeplitz-plus- 
Hankel  algorithms  could  easily  be  applied  to  solve  standard  Markov  Random  Field  estimation  problems  for 
images.  Yet  the  use  of  such  fast  interpolation  algorithms  seems  to  be  unheard-of  in  image  processing,  which 
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is  surprising.  The  state  of  the  art  in  image  processing  is  very  diflFerent  than  the  situation  for  one-dimensional 
signal  processing,  where  the  use  of  fast  algorithms  for  linear  prediction  (causal,  one-sided  models)  is  almost 
universal,  and  covers  an  enormous  variety  of  applications. 


We  conclude  this  section  with  a  statement  of  our  basic  result.  Since  the  Levinson  algorithm  is  so  well- 
known,  we  do  not  give  the  detailed  description  of  the  steps  to  find  the  a*"  and  b™.  These  coefficients 
respectively  are  the  solutions  to  the  forward  prediction  problem  (forecasting)  and  the  backward  prediction 
problem  (postdiction  =  backcasting). 

Propostion  2.3  (Outer  Interpolation:  Levinson  Solution)  The  solution  f”  to  the  outer  interpolation 
problem  can  be  obtained  from  the  size-(2n  1)  solutions  prediction  and  postdiction 

problems:  ^ 

f"  =  Ja^""''^  -I-  (2-2.1) 

Algorithm  2.4  (Recursion  for  Inner  Interpolation)  Let  {xk)k  be  a  real  stochastic  process  with  all  co- 
variance  blocks  R"  invertible,  and  let  a™  denote  the  solution  to  the  forward  prediction  problem  of  size  m 
(obtained  e.g.  from  the  Levinson  recursion).  Then  the  solution  to  the  size-n  least-squares  inner  interpola¬ 
tion  problem 

1  =  1 

can  be  calculated  from  inductively  as: 

Step  One:  Get  new  (tx^).  A,  using  coefficients  from  preceding  stage.  By  symmetry,  only  half  the  entries 
of  these  dot  products  need  be  computed: 

^j.2n+l  ^  Jr2n+1)T  (^g2n+l  Ja^^+l) 


2\n.-l-l  _ 


2(ro  -h  r2„+2) 

=  (r2”+i  +  Jr2”+i)^ 
Step  Two:  find  new  d  and  c; 

=  - 


(aTf 


9 


r.^+1 


Step  Three:  Obtain  new,  size-(n  -I- 1)  Interpolation  Coefficients.- 

f  g^+^  \  /  \  /  5?  \ 


9i 
92^^ 


nti+l 


=  -(r+^ 


a: 


^n+2 

271-|-1 

n+3 


+  Ct. 


n— 1 


2n+l  .  ^2n+l 
^2n-fl  ^ 


+ 


^n+1 


92 


9n 


□ 


(2.2.3) 

(2.2.4) 

(2.2.5) 

(2.2.6) 


V  9lX\  J  V  1  /  VO/ 

To  introduce  the  two-dimensional  version  of  this  algorithm  in  §2.2.6,  the  expression  (2.2.6)  in  step  three 
will  also  be  written  in  terms  of  error  vectors,  that  is,  the  coefficients  of  error  polynomials.  This  relationship, 
which  is  at  the  heart  of  the  recursion,  can  also  be  expressed  more  briefly  as 


(2.2.7) 

tly  by: 

9l) 

(2.2.8) 

•  •,-Co) 

(2.2.9) 

(2.2.10) 

^Here  J  is  the  canonical  flip  matrix,  with  I’s  down  the  anti-diagonal. 
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2.2.5  ID  Interpolation  by  Orthogonal  Polynomials:  Real  Case 

Preview-  Fast  Algorithms  Derived  From  Orthogonal  Function  Theory:  In  §2.2.3,  we  saw  that 
the  most  basic  features  of  the  linear  prediction  problem  did  not  actually  depend  specifically  on  having  a 
Toeplitz  covariance  matrix.  Instead,  what  really  counts  is  that  the  matrix  in  the  prediction  system  is  a 
Gramian  matrix,  made  up  out  of  spectral  inner  products  of  basis  functions  {z\z^)s-  This  message  applies 
equally  well  to  estimating  an  optimal  interpolator,  not  just  to  the  more  familiar  problem  of  estimating  an 
optimal  predictor.  The  basic  linear  system  to  solve  for  interpolation  does  not  involve  a  covariance  matrix, 
but  rather  a  Toeplitz-plus-Hankel  matrix.  The  two  linear  systems,  for  prediction  and  for  outer  interpolation, 
do  not  look  totally  alike,  because  they  correspond  to  different  least-squares  problems.  Yet  from  the  Gramian 
viewpoint  the  prediction  system  and  the  interpolation  system  are  essentially  identical.  The  only  change 
is  that  the  Fourier  basis  {z^}k  of  unit  circle  powers  has  been  replaced  with  a  new  real-interpolation  basis 
{ip'‘}k  =  {iiz''  +  which  are  basically  cosine  functions  rather  than  powers  of  2.  Note  that  these 

are  not  actually  polynomials  in  .2;  (although  they  are  laurnomials  in  2:  and  z~^),  so  the  classical  theory  of 
orthogonal  polynomials  has  to  be  generalized  somewhat  to  work  here.  But  otherwise,  the  mathematics  is 
fairly  similar  to  the  standard  prediction  problem,  where  z’s  have  been  replaced  by  tp’s.  This  shows  the  value 
of  the  concept  of  generalized  prediction:  both  the  systems  in  the  righthand  column  are  cases  of  a  generalized 
prediction  problem,  and  they  differ  only  in  which  basis  is  being  considered.  They  can  be  solved  by  the  same 
general  principles;  the  Szegb  recipe  applies  equally  well  to  both. 

The  more  detailed  discussion  in  this  chapter  of  [1]  begins  with  an  examination  of  the  equivalence  among 
the  original  size-2n  +  1  outer  interpolation  system  in  the  previous  section,  the  more  economical  Toeplitz- 
plus-Hankel  version  of  this  system,  and  the  Gramian  view  of  these  as  generalized  prediction.  It  is  also 
shown  how  inner  interpolation  can  be  viewed  as  a  generalized  postdiction  problem,  and  a  very  general  way 
of  solving  postdiction  problems  in  terms  of  prediction  problems.  Next  there  is  a  further  exploration  of  the 
basis  {<p*}fc  =  ^  change  of  variables  away  from  the  Chebyshev  polynomials. 

Many  familiar  Fourier  facts  can  immediately  be  translated  into  Chebyshev  properties  by  this  change  of 
variables,  such  as  the  generalized  Parseval’s  equality.  We  also  show  that  with  this  change  of  variables, 
usual  transforms-  like  the  polynomial,  discrete/continuous  Fourier  or  2:  transforms-  turn  into  a  new  kind  of 
transform  which  we  call  the  Chebyshev  transform.  To  save  space  in  this  report,  we  have  omitted  both  the 
Chebyshev  transform  and  also  an  interesting  illustration  where  the  theory  can  be  applied  in  a  “turn-the- 
crank”  way  to  obtain  the  Trench  algorithm  for  Hankel  matrix  inversion  in  just  a  few  lines,  rather  than  in 
the  dozen  pages  that  Trench  required  (see  [22]).  This  example  shows  that  our  theory  really  is  more  general 
than  standard  linear  prediction,  and  can  be  used  to  derive  important  practical  results.  In  this  report,  we  do 
indicate  how  interpolation  can  be  mapped  into  generalized  prediction,  and  give  a  fast,  three-term  recurrence 
for  generalized  prediction  problems.  This  recurrence  algorithm  is  reminiscent  of  the  Chebyshev  second-order 
recurrence,  but  actually  does  not  specifically  require  that  the  basis  functions  {f^]k  be  polynomials.  It  will 
work  for  any  problem  defined  with  a  Gramian  inner-product  structure,  for  example  the  Hankel  inversion 
problem. 

Various  other  work  on  the  Toeplitz-plus-Hankel  problem  was  cited  in  §2.2.4.  There  does  not  appear  to 
be  any  published  work  solving  the  interpolation  problem  by  orthogonalizing  a  basis  of  functions  the  way 
that  is  done  here,  even  though  it  is  well-known  how  to  solve  the  prediction  problem  by  orthogonalizing  the 
unit-circle  basis. 


A  Basis  for  the  Interpolation  Problem 
property  of  the  interpolation  coefficients: 


:  A  consequence  of  stationarity  is  the  center  conjugate  symmetry 
Qk  =  ■  Prom  now  on,  we  will  assume  the  process  {xk)k  is 


real-valued,  so  that  the  terms  can  be  grouped  together  in  the  normal  equations  for  interpolation.  For  example. 
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the  mean- square  error  expression  involves 

3Efc{|a;fc  -  ^  9'iXk+i\^}  =  + 


0<|i|<n 


1-1 


I5(z) 


S[z) 


l=l 


This  suggests  that  rather  than  using  the  unit-circle  basis  {z'‘}k,  the  interpolation  problem  can  best  be 
expressed  in  terms  of  a  new  symmetrical  basis  {‘p'"}k,  defined  as 


(2)  =  1  {2*=  +  2-* }  =  I  -h  e  }  =  cos  (ke) 


(2.2.1) 


The  factor  of  1  was  introduced  to  make  the  new  basis  look  more  like  other  familiar  expressions.  With  the 
substitution  u  =  cos  (6),  these  basis  functions  in  fact  do  become  the  Chebyshev  polynomials  T'°{u).  The 

factor  of  i  can  be  counterbalanced  by  defining  d’^  =  2gk  •  Basically,  all  the  constructions  we  made  for 

prediction  using  the  circle  basis  {z*}*  now  can  be  carried  over  to  the  cosine  basis  {¥’*’}*  for  interpolation. 
For  prediction,  the  error  vectors  e”,  for  example,  involved  the  very  latest  2:"  as  a  sum  of  z°,  z^,  .. 

Now  for  interpolation,  the  spectral  normal  equations  ask  for  the  constant  basis  vector  (p°(z)  to  be  expressed 
as  a  growing  sum  of  (p^,  . . . ,  1^": 

0  =  {ed{z),‘p'‘),  l<k<n 

=  (/,  VJ")  -di{ip\ip'‘)-...-  dniip^, 

The  corresponding  augmented  matrix  system  looks  like  backward  prediction: 


(2.2.2) 
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(QtO.nl)^  "3 

For  prediction,  the  Gramian  matrix  for  the  basis  {z‘‘}k  turned  out  to  be  the  (Toeplitz)  covariance  matrix 
R  in  (2.2.1).  By  contrast,  the  Gramian  matrix  Gy,  for  interpolation  is  a  Toeplitz-plus-Hankel  (T-l-H)  matrix: 

(2.2.4) 


where  the  Hankel  matrix  is  defined  by  Hki  =  rk+i- 
normal  equations  (2.2.3)  for  interpolation  is 

/  1  \ 
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ro  +  r2 

r2  +  r2 

ri  -b  rs 

•  •  ^n-2  +  ^n+2 

\  r„  -b  r„ 

fn-l  +  rn+1 

ro  +  r2n  / 

So  an  equivalent  way  of  expressing  the  augmented 


A  Recursion  for  Interpolation:  We  can  find  the  best  least-squares  interpolation  coefficients  g  by  “fitting 
a  smaller  covariance/ Gramian  into  a  larger  one,”  in  a  way  similar  to  [11];  unlike  [11],  though,  we  do  not 
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map  the  problem  to  the  size-(2n  +  1)  Levinson  algorithm.  Inductively,  we  would  like  to  construct  a  new 
solution  to  the  size-(n  +  1)  interpolation  problem 

(  1  ^ 


/  G” 

/  _ 

\ 

V 

- - - — 

/ 

_d"+i 


) 


0 


V 


(2.2.5) 


in+l 


a, 

The  graphics  in  the  upper  left-hand  corner  is  intended  to  emphasize  that  we  have  available  to  us  the  previous, 
order-n  augmented  solution  for  the  interpolation  coefficients  in  (2.2.3).  If  we  increase  the  size  of  this  solution 
by  zero-padding,  we  get 


(2.2.6) 


with  the  “leftovers”  given  by 

8^'^^  =  Gn+lfi  ~  diGn+1,1  —  ■  ■  •  —  d^Gn+l,n 
We  will  also  make  use  of  the  size-(n  +  1)  generalized  prediction  solution; 

^  \  /  n  \ 


(2.2.7) 
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(2.2.8) 
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Since  the  right-hand  side  vectors  of  (2.2.5),  (2.2.6),  (2.2.8)  span  a  two-dimensional  subspace  when  {(7^)^+  ,  (c^  )3  / 
0,  we  can  construct  the  desired  order-(n  +  1)  coefficients  as  a  linear  combination  of  the  two  left-hand 
sides: 

=  aeS  +  /?e"+i 

A  little  linear  algebra  then  gives  the  coefficients  a  =  a”+^,  0  =  for  step  n  +  1,  as  indicated  below.  In 
the  bookkeeping  for  operation  counts,  we  will  assume  that  the  entries  Gk,i  =  2{rk-i+rk+i)  for  the  Gramian 
matrix  have  been  precomputed  and  stored  in  advance,  with  no  charge  for  table-lookup. 

Algorithm  2.5  (Interpolation  Recursion)  The  order-{n  +  1)  interpolation  coefficients 
can  be  obtained  by  the  following  procedure,  which  requires  0{8{n  +  1))  steps  per  cycle: 


1.  ((t2)”+1  =  Gn+l,n+l  -  ^ 


(2.2.9) 


fe=0 
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n 


2. 

=  Gr^+l,o-Y.{2g'^)Gr^+l,k 
k=l 

(2.2.10) 

3. 

f(T2')"+l 

n+l  )c 

(cr2)?+^  -h  c^+^<5^+i 

(2.2.11) 

1 

/on+1  d 

^  (a2)?+i-hc^+i(5”+' 

(2.2.12) 

5. 

n+1  _  f  -  \  l<k<n 

9k  j  fc  =  n-hl 

(2.2.13) 

Initializations;  c®  ■(-  •«-  Gi,o,  ffi  <- 

Corollary  2.6  Recursion  for  the  interpolation  error: 

(»");+■  =  0”+' (»’);.  Go, 0-fe 

To  complete  the  picture,  we  will  obtain  the  generalized  prediction  coefficients  c^+^ 

(2.2.14) 

from  a  three-term  recur- 

rence. 


A  Three-Term  Recurrence  for  Generalized  Prediction:  Three-term  recurrences  are  part  of  the  lore 
of  the  orthogonal  polynomials  (e.g.  [5],  §3.2);  they  are  widely  used  in  numerical  analysis  [23],  [24]  and  in 
describing  the  special  functions  of  mathematical  physics.  In  signal  processing,  split  algorithms  [25]  are  an 
example  of  this  sort  of  procedure.  To  get  an  algorithm  producing  the  generalized  prediction  coefficients 
for  the  interpolation  basis  (2.2.1),  we  start  with  the  Chebyshev-type  recursion  for  each  basis  element: 

(2.2.15) 


^"+1(2:)  =  2ip^{z)^p^{z)  -  <p”  \z) 


This  is  just  a  statement  of  the  trig  identity  2  cos^  cos  kO  =  cos(fe  +  1)0  +  cos{k  -  1)0.  It  can  be  used  to 
produce  a  recursion  for  the  error  polynomials  e"  in  (2.2.13): 

Algorithm  2.7  (Three-Term  Recurrence)  The  error  polynomial  e"+^(^)  ean  be  generated  from  the  two 
preceding  error  polynomials  e’fjz),  e^-'^jz)  in  0(8n)  steps  from  the  second-order  recursion: 

(2.2.16) 


e^+Hz)  =  (2(/ji(^)  +  5”)e"(^)  + 


The  coefficients  in  this  recursion  are  calculated  via 


ri  —  l 


1.  (o"^)”  =  (^"i  ®”)  ^n,n  ~  ^Gn,k 


k-0 


2. 


3. 


P"  =  +  7^  [E  <^VGn+l,k  -  G«+l.n] 

'  '  *;=0 


C"  =  - 


((72) 


.2\n-l 


(2.2.17) 

(2.2.18) 
(2.2.19) 
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Initializations;  e°  =  =  1,  ■(—  Go,o 

=  ff'-  -  Co,  Co  <-  <-  Gi,i  -  -77— 

Croo  '-'00 

In  all  these  expressions,  the  elements  of  the  Toeplitz-plus-Hankel  Gramian  matrix  are  given  by  Gk,i  =  2y'^k-i  + 
rk+i)- 
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Sketch  of  proof.  It  is  easy  to  see  that  the  coefficient  of  the  highest-order  </?  vanishes  in  the  function  q{z)  = 
e'"+^{z)  -  e"(^).  So  a  Gram-Schmidt  expansion  of  qiz)  in  terms  of  the  preceding  error  polynomials 

must  look  like 

lt=0  '  ’  '  fc=o  ^  ,  ,,  J 

Thanks  to  orthogonality,  only  the  two  highest-order  dot  products  on  the  right  are  nonzero;  these  correspond 
to  the  coefficients  called  S",  C". 


2.2.6  A  Levinson- Type  Recursion  for  Interpolation  in  Two  Dimensions 

In  this  section,  we  will  go  through  the  two-dimensional  generalization  of  the  Szego  orthogonalization  ideas 
presented  in  §2.2.3,  with  the  goal  of  finding  fast  methods  to  solve  the  normal  equations  for  interpolation.  The 
basic  ideas  fortunately  turn  out  to  be  pretty  much  the  same  as  before;  they  just  require  more  indices  to  be 
expressed.  In  keeping  with  the  style  of  previous  sections,  we  will  provide  a  certain  amount  of  representative 
material  in  this  report,  but  omit  much  of  the  technical  detail.  In  particular,  the  important  definitions 
regarding  orthogonality  are  made  with  enough  detail  to  be  actually  intelligible,  but  the  details  about  the 
block  structure  of  a  two-dimensional  covariance  matrix  are  completely  suppressed.  For  this  and  the  form 
of  the  standard  Levinson  recursion  in  two  dimensions,  the  reader  is  referred  as  usual  to  [1].  This  section 
concludes  by  giving  just  enough  detail  about  a  Levinson-type  recursion  algorithm  in  two  dimensions  so  that 
the  reader  may  verify  its  basic  similarity  to  algorithm  2.4. 


Inner  Products  and  Orthogonalization  in  Two  Dimensions:  The  essential  idea  for  the  Szego  or¬ 
thogonalization  process  was  to  express  the  normal  equations  in  frequency  space.  To  make  sense  of  this  in 
two  dimensions,  we  need  to  look  at  the  two-dimensional  spectral  and  covariance  inner  products,  which  relate 
expectations  in  the  “time”  domain  to  algebra  with  z’s  in  the  frequency  domain.  Recall  that  for  a  general 
2D  discrete  random  process  {xk,i)k,i,  the  general  definition  of  a  covariance  element  is 

R(i,i),(fc,()  =  {xij  -  fiij)  {xk,i  -  Pk,i)  }  (2-2-1) 

Here,  the  mean  of  the  process  at  pixel  {i,j)  is  denoted  by  '‘'fJ.ij.'’  Note  that  the  random  process  is  not 
assumed  stationary  in  (2.2.1),  so  the  full-blown  covariance  “matrix”  actually  requires  four  indices,  rather 
than  two.  We  could  make  it  look  more  like  a  familiar  one-dimensional  covariance  by  using  multi-index 

notation  r  1  f  I 

Ra,b  =  1e{  (a:a  -  Ma)  {xb  “  Mb)  |  =  1E|  ^1,02  “  Mai.aj)  (2:61,62  “  tibubi)  |  (2-2-2) 


where  a  =  (ai,  02)  and  b  =  (61, 62)  are  simple,  shorthand  ways  of  denoting  a  vector  index,  consisting  of  two 
component  indices. 

Recall  that  for  a  two-dimensional  random  process  {xk,t)k,i  to  be  second-order  stationary,  the  second-order 
statistics  should  be  shift-invariant: _ 

JE[{xi+pj+g-pi+p,j+,){xij-Pi,j)}  =  rp.„  independent  of  (i,i)  (2.2.3) 

We  will  now  drop  the  mean  and  denote  this  more  briefly  as 

lE^  j+g  Xj.j  }  Xp,q  (2.2.4) 

by  replacing  the  process  {xk,i)k,i  by  the  process  yk,i  —  ixk,i)k,i~ tbk,i  (covariance  matrix  =  correlation  matrix). 
The  two-dimensional  stationarity  property  can  also  be  expressed  as 


In  other  words,  the  stationary  covariance  array  {rk,i)k,i  involves  only  two  indices,  rather  than  the  four  needed 
in  (2.2.1),  and  is  an  ordinary  matrix.  The  multi-index  version  of  (2.2.5)  is  simply  Ra,b  =  ?’a-b- 
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The  two-dimensional  covariance  sequence  and  the  two-dimensional  spectral  density  are  related  by  a  two- 
dimensional  Z-transform;  oo  oo  oo  oo 

S(zuZ2)=  f:  E  =  E  (2.2.6) 

k=—ool——oo  kz=—ool=—oo 

where  the  second  term  on  the  right  is  obtained  by  taking  Zm  to  be  the  unit-circle  variable  m  =  1,2. 

The  multi-index  version  of  this  fact  is  denoted  ^  ^ 

S{z)  =  E  E  E  ^  (2.2.7) 

mezixz  mi— oom2=— oo 

More  generally,  the  multi-index  notation 

x"  =  (2.2.8) 

is  useful  for  denoting  polynomials  in  the  variables  xi,  X2,  ■■  •  ,iCp,  i.e.  with  p-dimensional  indices. 


Two-Dimensional  Spectral  Inner  Product:  Fourier  inversion  is  as  straightforward  in  two  dimensions 
as  it  is  in  one  dimension,  giving  a  way  of  recovering  the  covariance  sequence  {rk,i)k,i  from  the  spectral  density 
■5(^1,  ^2),  which  is  periodic  in  each  variable  separately: 


=  j  j  zt  z^^Sizi,  Z2) 


dzi 


dZ2 


277121  277722 


(2.2.9) 


In  the  second  integral  on  the  right,  Ti  and  r2  are  simple  closed  curves  enclosing  the  unit  circles  \zi\  = 
|e*®i|  =  1,  respectively  |2;2|  =  =  1-  ^ 

In  two  dimensions,  the  basis  to  orthogonalize  for  prediction  is  now  the  set  of  all  monomials  ^2^  where 
ki,  k2  are  integers.  Earlier,  we  arrived  at  what  the  definition  of  the  spectral  inner  product  should  be  by 
looking  at  its  behavior  for  the  unit-circle  basis,  and  we  can  do  the  same  thing  in  two  dimensions.  In  both 
cases,  this  can  be  done  by  rewriting  the  basic  Fourier  inversion  integral  above  to  look  more  like  an  inner 
product  of  basis  functions: 

11(0,6),  (ife.O  =  ‘fa-k,b-l  ~  ^  ^2  “^(^l)  .2:2)  ^1  ^2  2^1*7  (2.2.10) 

The  integral  on  the  right  will  be  taken  as  the  definition  of  the  spectral  inner  product  of  the  basis  functions 
zl  Z2  and  z}  z^: 

=  (.f  4.44),  =  (G., 

This  is  a  reminder  that  the  Gramian  matrix  of  inner  products  of  the  basis  functions  depends  on  four  indices, 
just  the  way  the  covariance  does  in  (2.2.1)-  this  is  hardly  surprising,  considering  that  the  covariance  should 
be  equal  to  this  Gramian.  In  multi-index  notation,  this  can  be  written 

=  (g.)„„  (2.2.12) 

The  definition  of  the  spectral  inner  product  on  basis  functions  (z^  4)*,(  can  be  extended  to  functions  that 


are  square-integrable  with  respect  to  the  measure 

dfx  =  S{zu  Z2)  1  .  J  £  1  V, 

That  is,  the  spectral  inner  product  of  ^(^i,  Z2)  and  G(zi,  Z2)  is  defined  to  be 

(^F(zi,  Z2),  G(zi,  Z2)  )s  ~  2^  2^  (2.2.13) 

As  in  one  dimension,  the  spectral  inner  product  in  2D  frequency  space  can  be  related  to  a  “time-domain” 
inner  product.  This  is  the  idea  of  the  ID  generalized  Parseval’s  equality  (2.2.5),  which  we  would  like  to 
carry  through  to  two  dimensions.  Rather  than  speak  about  the  “time  domain”  as  being  the  inverse  of  the 
two-dimensional  frequency  domain,  for  2D  images  it  makes  more  sense  to  talk  the  “space  domain”  being  a 
transform  away  from  the  frequency  domain. 

In  §2.2.3,  the  one-dimensional  spectral  inner  product  {F{z),  G{z))s  was  found  to  correspond  to  the  inner 
product  of  vectors  (f,  g)j^  =  F  Rg •  Now  that  we  are  living  in  two  dimensions,  we  should  replace  a  singly- 
indexed  vector  f  =  {fk}k=o  a  doubly-indexed  array  f  =  {fk,i}k,h  say  for  0  <  fe  <  m,  0  <  1  <  n.  We 
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will  put  a  arrow  above  these  doubly-indexed  vectors  here,  as  a  reminder  that  they  are  not  garden-variety 
vectors.  Likewise,  an  arrow  above  the  covariance  matrix  R  is  a  reminder  that  it  is  quadruply-subscripted, 
as  in  (2.2.1).  The  two-dimensional  covariance  block  inner  product,  which  is  a  direct  generalization  of 
2.2.4,  then  can  be  written 

(fig)-  =  fa,b^{a,b),(k,l)W^  =  fa,b  ’’’a-k^b-l  9k, I  (2.2.14) 


0<a<m  0<Aj<Tn 
0<6<n  0<l<n 


0<a<Tn  0<fc<r 
0<fc<n  0<I<n 


where  we  have  used  the  stationarity  property  (2.2.5). 

In  order  to  derive  a  two-dimensional  Parseval’s  relation,  the  two-dimensional  polynomial  transforms  of 
the  multi- vectors  f,  g  are  needed: 

m  n 

f  =  (/a,6)o<»<.n  V{j)  =  Y.  Z-,)  (2.2.15) 


a=0  6=0 
m  n 


g  =  {9k,l)o<k<r 


Fig)  =  E  ^2  =  G{z1,  Z2) 


(2.2.16) 


*=0  z=o 


K — U  I  —  U 

Rather  than  having  the  sums  bristle  with  indices,  we  may  just  write  “(A;,  /)”  on  the  bottom  of  a  sum  involving 
g,  etc.  It  is  now  easy  to  establish  a  two-dimensional  generalized  Parseval  relation,  based  on  the  fact  (2.2.11) 
that  ra,b  =  (z®i  z**): 

(fig)-  =  E  E  'b‘a-k,b-l  WJ 

^  (a,b)  (k,l) 

=  E  E  4 4 =  { E  4 4.  E  4 

(a,b)  (k,l)  (a,b)  {k,l) 

=  (^F{zi,  Z2),  G{zi,  Z2)'^ ^ 

We  did  not  even  have  to  write  any  unit-circle  contour  integrals  out  explicitly  here:  what  really  counts  in 
the  proof  is  properties  of  the  inner  products  of  basis  functions.  As  a  summary,  the  full  statement  of  the  2D 
generalized  Parseval’s  relation,  including  integrals,  is 


(f,  =  (r(2l,  Z,),  G(2i,  2,))^ 

EE/-^-‘.‘-'  9k, I  “  /  /  F{zi,  Z2)  S{zi,  Z2)  G{zi,  Z2) 

{  (t  t\  *^^2 

(0,6)  {k,l) 


(2.2.18) 


2'Kiz\  2'Kiz2 


Sketch  of  Block  Recursion  for  2D  Interpolation:  With  the  usual  conventions  for  block  covariances, 
the  recursion  for  2D  interpolation  coefficients  can  be  expressed  in  terms  of  block  error  vectors.  For  outer 
and  inner  interpolation,  these  are  defined  respectively  by 

=  (l  I -F!l„ - F;j|i)  =  (i|  |l)  (2.2.19) 

e"  1=  ( 0  I  -G"„ - G:i|o)  =  (o|  e”  |  o)  (2-2.20) 

gn+l  ^  _Gn+l_^  I  _Gn+l  .  .  .  _  Qn+l  |  _G-+1  )  (2.2.21) 

Underlining  is  used  to  indicate  that  these  error  vectors  are  in  fact  arrays  each  of  whose  components  is  a 
matrix  block. 

The  algorithm  makes  use  of  the  solution  to  outer  interpolation,  which  can  be  obtained  from  the  standard 
multichannel  Levinson  solution  by 

=  (a2”+i  J)  (2.2.22) 

Formally,  the  algorithm  is  very  similar  to  (2.2.7). 
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Algorithm  2,8  (Recursion  for  Interpolation  in  Two  Dimensions) 

The  order-{n  -{- 1)  error  vector  for  two-dimensional  inner  interpolation  coefficients  can  be  obtained  from  the 
order-n  inner  interpolation  error  vector  and  the  multichannel  Levinson  solution  via 


^Ti+i  — 
-9 


C  eg  +  De” 


,n+l 


Step  One:  Get  new  S,  A,  using  coefficients  from  preceding  stage  and  Levinson  solution: 
=  2  Ro  +  R-2n-2  +  R2n+2  -  T  [(£)+•"'  +  l]  ^ 


^p”++Ar-  = 

Step  Two:  find  new  D  and  C; 

D  =  A”+^  [  A”+^ 


T-1 


=  [f^  +  A"+^ 

c  =  (F^r's;?  [a”+i  +  (Fo"r's;?]"'  =  [ 


:^n\  — 1 


_1 


.  (S/) 

An+l  (S«)-lF^  +  l] 


_1 


Having  obtained  the  new,  order-{n  +  1)  coefficients  D^n  +  1)  and  C'n  +  1)  from  steps  one  and 
recursion  now  produces  a  new,  higher-order  error  vector  from  (2.2.24).  This  block  error  vector  6^+^ 
the  new,  order-{n  +  1)  block  coefficients  for  the  optimum  interpolator  (=  Step  Three). 


(2.2.23) 

(2.2.24) 

(2.2.25) 

(2.2.26) 

(2.2.27) 

two,  the 
contains 


References 

[1]  M.  M.  Scheffe.  Interpolation,  Orthogonalization  and  Random  Fields.  PhD  thesis,  Brown  University, 
Division  of  Engineering,  1996.  Available  from  University  Microfilms,  300  N  Zeeb  Rd,  Ann  Arbor  MI 
48103-1500,  1-800-824-0814. 

[2]  T.  Kailath,  A.  Viera,  and  M.  Morf.  Inverses  of  Toeplitz  operators,  innovations,  and  orthogonal  polyno¬ 
mials.  SIAM  Review,  20(1):106-119,  1978. 

[3]  Philippe  Delsarte  and  Yves  Genin.  On  the  role  of  orthogonal  polynomials  on  the  unit  circle  in  digital 
signal  processing  applications.  In  Paul  Nevai,  editor.  Orthogonal  Polynomials:  Theory  and  Practice, 
pages  115-133.  Kluwer  Academic  Publishers,  1990.  (Proceedings  of  the  NATO  Advanced  Study  Institute 
on  Orthogonal  Polynomials  and  Their  Applications,  Columbus,  Ohio,  May  22- June  3,  1989). 

[4]  Thomas  Kailath.  Notes  on  the  Szego  unit  circle  orthogonal  polynomials  in  least-squares  prediction 
theory.  In  Richard  Askey,  editor,  Gabor  Szego:  Collected  Papers,  volume  I,  pages  43-46.  Birkhauser, 
1992. 

[5]  Gabor  Szego.  Orthogonal  Polynomials.  American  Mathematical  Society,  Providence,  R.I.,  4th  edition, 
1975. 

[6]  L.  Ya.  Geronimus.  Orthogonal  Polynomials.  Consultants  Bureau,  New  York,  1961. 

[7]  Ulf  Grenander  and  Gabor  Szego.  Toeplitz  Forms  and  Their  Applications.  University  of  California  Press, 
1958. 

[8]  T.  Kailath.  A  view  of  three  decades  of  linear  filtering  theory.  IEEE  Trans.  IT,  20:145-181,  1974. 

[9]  Athanasios  Papoulis.  Probability,  Random  Variables,  and  Stochastic  Processes.  McGraw-Hill,  1984. 

[10]  Theodore  S.  Chihara.  An  Introduction  to  the  Orthogonal  Polynomials.  Gordon  and  Breach,  1978. 

[11]  Cameron  K.  Coursey  and  John  A.  Stuller.  Linear  interpolation  lattice.  IEEE  Transactions  on  Signal 
Processing,  39(4):965-967,  1991. 


31 


[12]  Gulamabbas  A.  Merchant  and  Thomas  W.  Parks.  Efficient  solution  of  a  Toeplitz-Plus-Hankel  coefficient 
matrix  system  of  equations.  IEEE  Trans.  ASSP,  30(l):40-44,  1982. 

[13]  S.  Lawrence  Marple.  Fast  algorithms  for  linear  prediction  and  system  identification  filters  with  linear 
phase.  IEEE  Trans.  ASSP,  30(6):942-953, 1982. 

[14]  Andrew  E.  Yagle.  Fast  algorithms  for  structured  matrices  in  signal  processing.  In  N.  K.  Bose  and  C.  R. 
Rao,  editors,  Handbook  of  Statistics,  Vol.  10:  Signal  Processing,  pages  933-973.  Elsevier,  1993. 

[15]  Benjamin  Friedlander  and  Martin  Morf.  Least  squares  algorithms  for  adaptive  linear-phase  filtering. 
IEEE  Trans.  ASSP,  30(3):381-389,  1982. 

[16]  Georg  Heinig,  Peter  Jankowski,  and  Karla  Rost.  Fast  inversion  algorithms  of  Toeplitz-plus-Hankel 
matrices.  Numer.  Math,  52:665-682,  1988. 

[17]  Jin- Jen  Hsue  and  Andrew  E.  Yagle.  Fast  algorithms  for  close-to-Toeplitz-plus-Hankel  systems  and 
two-sided  linear  prediction.  IEEE  Transactions  on  Signal  Processing,  41(7):2349-2361, 1993. 

[18]  Steven  M.  Kay.  Some  results  in  linear  interpolation  theory.  IEEE  Trans.  ASSP,  31:746-749,  31. 

[19]  Ali  H.  Sayed,  Hanoch  Lev-Ari,  and  Thomas  Kailath.  Fast  triangular  factorization  of  the  sum  of  quasi- 
Toeplitz  and  quasi-Hankel  matrices.  Linear  Algebra  and  Its  Applications,  191:77-106,  1993. 

[20]  Christopher  J.  Zarowski.  Schur  algorithms  for  Hermitian  Toeplitz,  and  Hankel  matrices  with  singular 
leading  principal  submatrices.  IEEE  Transactions  on  Signal  Processing,  39(ll):2464-2480, 1991. 

[21]  Christopher  J.  Zarowski.  A  Schur  algorithm  and  linearly  connected  processor  array  for  Toeplitz-plus- 
Hankel  matrices.  IEEE  Transactions  on  Signal  Processing,  40(8):2065-2078, 1992. 

[22]  William  F.  Trench.  An  algorithm  for  the  inversion  of  finite  Hankel  matrices.  J.  Soc.  Indust.  Appl. 
Math.,  13(4):1102-1107, 1965. 

[23]  Walter  Gautschi.  Computational  aspects  of  three-term  recurrence  relations.  SIAM  Review,  9(l):24-82, 
1967. 

[24]  Walter  Gautschi.  A  survey  of  Gauss-ChristoflFel  quadrature  formulas.  In  P.  L.  Butzer  and  F.  Feher, 
editors,  E.  B.  Christoffel:  The  Influence  of  His  Work  on  Mathematics  and  the  Physical  Sciences,  pages 
72-147.  Birkhauser,  1981. 

[25]  Philippe  Delsarte  and  Yves  Genin.  A  survey  of  the  split  approach  based  techniques  in  digital  signal 
processing  applications.  Phillips  J.  Res.,  43:346-374,  1988. 


32 


